Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
Paddle-Lite
提交
f9a87f61
P
Paddle-Lite
项目概览
PaddlePaddle
/
Paddle-Lite
通知
332
Star
4
Fork
1
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
271
列表
看板
标记
里程碑
合并请求
78
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle-Lite
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
271
Issue
271
列表
看板
标记
里程碑
合并请求
78
合并请求
78
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
f9a87f61
编写于
9月 02, 2018
作者:
R
Ruilong Liu
提交者:
GitHub
9月 02, 2018
浏览文件
操作
浏览文件
下载
差异文件
Merge pull request #883 from codeWorm2015/metal
Metal
上级
cbfc1d74
1bd51ef7
变更
46
隐藏空白更改
内联
并排
Showing
46 changed file
with
904 addition
and
350 deletion
+904
-350
metal/paddle-mobile-demo/paddle-mobile-demo/Base.lproj/Main.storyboard
...mobile-demo/paddle-mobile-demo/Base.lproj/Main.storyboard
+14
-15
metal/paddle-mobile-demo/paddle-mobile-demo/Net/MobileNetSSD.swift
...dle-mobile-demo/paddle-mobile-demo/Net/MobileNetSSD.swift
+8
-3
metal/paddle-mobile-demo/paddle-mobile-demo/Net/Net.swift
metal/paddle-mobile-demo/paddle-mobile-demo/Net/Net.swift
+2
-2
metal/paddle-mobile-demo/paddle-mobile-demo/ViewController.swift
...addle-mobile-demo/paddle-mobile-demo/ViewController.swift
+5
-5
metal/paddle-mobile/paddle-mobile.xcodeproj/project.pbxproj
metal/paddle-mobile/paddle-mobile.xcodeproj/project.pbxproj
+0
-4
metal/paddle-mobile/paddle-mobile/Common/MetalExtension.swift
...l/paddle-mobile/paddle-mobile/Common/MetalExtension.swift
+10
-10
metal/paddle-mobile/paddle-mobile/Executor.swift
metal/paddle-mobile/paddle-mobile/Executor.swift
+17
-19
metal/paddle-mobile/paddle-mobile/Loader.swift
metal/paddle-mobile/paddle-mobile/Loader.swift
+2
-2
metal/paddle-mobile/paddle-mobile/Operators/BoxcoderOp.swift
metal/paddle-mobile/paddle-mobile/Operators/BoxcoderOp.swift
+10
-10
metal/paddle-mobile/paddle-mobile/Operators/ConcatOp.swift
metal/paddle-mobile/paddle-mobile/Operators/ConcatOp.swift
+3
-3
metal/paddle-mobile/paddle-mobile/Operators/ConvAddBatchNormReluOp.swift
...bile/paddle-mobile/Operators/ConvAddBatchNormReluOp.swift
+2
-2
metal/paddle-mobile/paddle-mobile/Operators/ConvAddOp.swift
metal/paddle-mobile/paddle-mobile/Operators/ConvAddOp.swift
+17
-17
metal/paddle-mobile/paddle-mobile/Operators/ConvBNReluOp.swift
.../paddle-mobile/paddle-mobile/Operators/ConvBNReluOp.swift
+1
-1
metal/paddle-mobile/paddle-mobile/Operators/ConvOp.swift
metal/paddle-mobile/paddle-mobile/Operators/ConvOp.swift
+1
-1
metal/paddle-mobile/paddle-mobile/Operators/ConvTransposeOp.swift
...ddle-mobile/paddle-mobile/Operators/ConvTransposeOp.swift
+5
-3
metal/paddle-mobile/paddle-mobile/Operators/DepthwiseConvOp.swift
...ddle-mobile/paddle-mobile/Operators/DepthwiseConvOp.swift
+1
-1
metal/paddle-mobile/paddle-mobile/Operators/DwConvBNReluOp.swift
...addle-mobile/paddle-mobile/Operators/DwConvBNReluOp.swift
+1
-1
metal/paddle-mobile/paddle-mobile/Operators/ElementwiseAddOp.swift
...dle-mobile/paddle-mobile/Operators/ElementwiseAddOp.swift
+6
-3
metal/paddle-mobile/paddle-mobile/Operators/FeedOp.swift
metal/paddle-mobile/paddle-mobile/Operators/FeedOp.swift
+1
-1
metal/paddle-mobile/paddle-mobile/Operators/Kernels/BatchNormKernel.swift
...ile/paddle-mobile/Operators/Kernels/BatchNormKernel.swift
+52
-45
metal/paddle-mobile/paddle-mobile/Operators/Kernels/ConvAddBatchNormReluKernel.swift
...mobile/Operators/Kernels/ConvAddBatchNormReluKernel.swift
+25
-15
metal/paddle-mobile/paddle-mobile/Operators/Kernels/ConvAddKernel.swift
...obile/paddle-mobile/Operators/Kernels/ConvAddKernel.swift
+16
-8
metal/paddle-mobile/paddle-mobile/Operators/Kernels/ConvBNReluKernel.swift
...le/paddle-mobile/Operators/Kernels/ConvBNReluKernel.swift
+16
-10
metal/paddle-mobile/paddle-mobile/Operators/Kernels/ConvKernel.swift
...e-mobile/paddle-mobile/Operators/Kernels/ConvKernel.swift
+5
-3
metal/paddle-mobile/paddle-mobile/Operators/Kernels/ConvTransposeKernel.swift
...paddle-mobile/Operators/Kernels/ConvTransposeKernel.swift
+22
-4
metal/paddle-mobile/paddle-mobile/Operators/Kernels/ElementwiseAddKernel.swift
...addle-mobile/Operators/Kernels/ElementwiseAddKernel.swift
+13
-1
metal/paddle-mobile/paddle-mobile/Operators/Kernels/PoolKernel.swift
...e-mobile/paddle-mobile/Operators/Kernels/PoolKernel.swift
+7
-1
metal/paddle-mobile/paddle-mobile/Operators/Kernels/PreluKernel.swift
...-mobile/paddle-mobile/Operators/Kernels/PreluKernel.swift
+19
-7
metal/paddle-mobile/paddle-mobile/Operators/Kernels/PriorBoxKernel.swift
...bile/paddle-mobile/Operators/Kernels/PriorBoxKernel.swift
+8
-7
metal/paddle-mobile/paddle-mobile/Operators/Kernels/ReluKernel.swift
...e-mobile/paddle-mobile/Operators/Kernels/ReluKernel.swift
+17
-11
metal/paddle-mobile/paddle-mobile/Operators/Kernels/ResizeKernel.swift
...mobile/paddle-mobile/Operators/Kernels/ResizeKernel.swift
+0
-62
metal/paddle-mobile/paddle-mobile/Operators/Kernels/SoftmaxKernel.swift
...obile/paddle-mobile/Operators/Kernels/SoftmaxKernel.swift
+14
-10
metal/paddle-mobile/paddle-mobile/Operators/Kernels/metal/ConvAddMetal.metal
.../paddle-mobile/Operators/Kernels/metal/ConvAddMetal.metal
+118
-0
metal/paddle-mobile/paddle-mobile/Operators/Kernels/metal/ConvKernel.metal
...le/paddle-mobile/Operators/Kernels/metal/ConvKernel.metal
+129
-0
metal/paddle-mobile/paddle-mobile/Operators/Kernels/metal/ConvTransposeKernel.metal
...-mobile/Operators/Kernels/metal/ConvTransposeKernel.metal
+127
-41
metal/paddle-mobile/paddle-mobile/Operators/Kernels/metal/Elementwise.metal
...e/paddle-mobile/Operators/Kernels/metal/Elementwise.metal
+41
-1
metal/paddle-mobile/paddle-mobile/Operators/Kernels/metal/PreluKernel.metal
...e/paddle-mobile/Operators/Kernels/metal/PreluKernel.metal
+68
-0
metal/paddle-mobile/paddle-mobile/Operators/PoolOp.swift
metal/paddle-mobile/paddle-mobile/Operators/PoolOp.swift
+1
-1
metal/paddle-mobile/paddle-mobile/Operators/PreluOp.swift
metal/paddle-mobile/paddle-mobile/Operators/PreluOp.swift
+2
-2
metal/paddle-mobile/paddle-mobile/Operators/PriorBoxOp.swift
metal/paddle-mobile/paddle-mobile/Operators/PriorBoxOp.swift
+3
-3
metal/paddle-mobile/paddle-mobile/Operators/ReluOp.swift
metal/paddle-mobile/paddle-mobile/Operators/ReluOp.swift
+1
-1
metal/paddle-mobile/paddle-mobile/Operators/ReshapeOp.swift
metal/paddle-mobile/paddle-mobile/Operators/ReshapeOp.swift
+4
-4
metal/paddle-mobile/paddle-mobile/Operators/SoftmaxOp.swift
metal/paddle-mobile/paddle-mobile/Operators/SoftmaxOp.swift
+6
-4
metal/paddle-mobile/paddle-mobile/Operators/TransposeOp.swift
...l/paddle-mobile/paddle-mobile/Operators/TransposeOp.swift
+2
-2
metal/paddle-mobile/paddle-mobile/framework/Tensor.swift
metal/paddle-mobile/paddle-mobile/framework/Tensor.swift
+65
-1
metal/paddle-mobile/paddle-mobile/framework/Texture.swift
metal/paddle-mobile/paddle-mobile/framework/Texture.swift
+17
-3
未找到文件。
metal/paddle-mobile-demo/paddle-mobile-demo/Base.lproj/Main.storyboard
浏览文件 @
f9a87f61
...
...
@@ -19,10 +19,10 @@
<rect
key=
"frame"
x=
"0.0"
y=
"0.0"
width=
"375"
height=
"667"
/>
<autoresizingMask
key=
"autoresizingMask"
widthSizable=
"YES"
heightSizable=
"YES"
/>
<subviews>
<imageView
userInteractionEnabled=
"NO"
contentMode=
"scaleAspectFit"
horizontalHuggingPriority=
"251"
verticalHuggingPriority=
"251"
ambiguous=
"YES"
image=
"hand.jpg"
translatesAutoresizingMaskIntoConstraints=
"NO"
id=
"ZZh-fw-LwK"
>
<imageView
userInteractionEnabled=
"NO"
contentMode=
"scaleAspectFit"
horizontalHuggingPriority=
"251"
verticalHuggingPriority=
"251"
translatesAutoresizingMaskIntoConstraints=
"NO"
id=
"ZZh-fw-LwK"
>
<rect
key=
"frame"
x=
"0.0"
y=
"20"
width=
"375"
height=
"247"
/>
</imageView>
<label
opaque=
"NO"
userInteractionEnabled=
"NO"
contentMode=
"left"
horizontalHuggingPriority=
"251"
verticalHuggingPriority=
"251"
ambiguous=
"YES"
text=
"Thread:"
textAlignment=
"natural"
lineBreakMode=
"tailTruncation"
baselineAdjustment=
"alignBaselines"
adjustsFontSizeToFit=
"NO"
translatesAutoresizingMaskIntoConstraints=
"NO"
id=
"2EB-m2-a3L"
>
<label
opaque=
"NO"
userInteractionEnabled=
"NO"
contentMode=
"left"
horizontalHuggingPriority=
"251"
verticalHuggingPriority=
"251"
text=
"Thread:"
textAlignment=
"natural"
lineBreakMode=
"tailTruncation"
baselineAdjustment=
"alignBaselines"
adjustsFontSizeToFit=
"NO"
translatesAutoresizingMaskIntoConstraints=
"NO"
id=
"2EB-m2-a3L"
>
<rect
key=
"frame"
x=
"10"
y=
"538"
width=
"68"
height=
"24"
/>
<constraints>
<constraint
firstAttribute=
"width"
constant=
"68"
id=
"Q5J-tq-JSX"
/>
...
...
@@ -32,19 +32,19 @@
<nil
key=
"textColor"
/>
<nil
key=
"highlightedColor"
/>
</label>
<pickerView
contentMode=
"scaleToFill"
ambiguous=
"YES"
translatesAutoresizingMaskIntoConstraints=
"NO"
id=
"DlO-dk-RMr"
>
<pickerView
contentMode=
"scaleToFill"
translatesAutoresizingMaskIntoConstraints=
"NO"
id=
"DlO-dk-RMr"
>
<rect
key=
"frame"
x=
"88"
y=
"510.5"
width=
"287"
height=
"80"
/>
<constraints>
<constraint
firstAttribute=
"height"
constant=
"80"
id=
"Sbi-05-Mwd"
/>
</constraints>
</pickerView>
<pickerView
contentMode=
"scaleToFill"
ambiguous=
"YES"
translatesAutoresizingMaskIntoConstraints=
"NO"
id=
"6MG-gv-hD5"
>
<pickerView
contentMode=
"scaleToFill"
translatesAutoresizingMaskIntoConstraints=
"NO"
id=
"6MG-gv-hD5"
>
<rect
key=
"frame"
x=
"85"
y=
"401"
width=
"290"
height=
"80"
/>
<constraints>
<constraint
firstAttribute=
"height"
constant=
"80"
id=
"yAL-JY-G6b"
/>
</constraints>
</pickerView>
<label
opaque=
"NO"
userInteractionEnabled=
"NO"
contentMode=
"left"
horizontalHuggingPriority=
"251"
verticalHuggingPriority=
"251"
ambiguous=
"YES"
text=
"Models"
textAlignment=
"natural"
lineBreakMode=
"tailTruncation"
baselineAdjustment=
"alignBaselines"
adjustsFontSizeToFit=
"NO"
translatesAutoresizingMaskIntoConstraints=
"NO"
id=
"avL-VK-Kha"
>
<label
opaque=
"NO"
userInteractionEnabled=
"NO"
contentMode=
"left"
horizontalHuggingPriority=
"251"
verticalHuggingPriority=
"251"
text=
"Models"
textAlignment=
"natural"
lineBreakMode=
"tailTruncation"
baselineAdjustment=
"alignBaselines"
adjustsFontSizeToFit=
"NO"
translatesAutoresizingMaskIntoConstraints=
"NO"
id=
"avL-VK-Kha"
>
<rect
key=
"frame"
x=
"10"
y=
"429"
width=
"65"
height=
"24"
/>
<constraints>
<constraint
firstAttribute=
"width"
constant=
"65"
id=
"6oA-g2-Xq4"
/>
...
...
@@ -54,7 +54,7 @@
<nil
key=
"textColor"
/>
<nil
key=
"highlightedColor"
/>
</label>
<button
opaque=
"NO"
contentMode=
"scaleToFill"
ambiguous=
"YES"
contentHorizontalAlignment=
"center"
contentVerticalAlignment=
"center"
buttonType=
"roundedRect"
showsTouchWhenHighlighted=
"YES"
lineBreakMode=
"middleTruncation"
translatesAutoresizingMaskIntoConstraints=
"NO"
id=
"wUL-9N-u1V"
>
<button
opaque=
"NO"
contentMode=
"scaleToFill"
contentHorizontalAlignment=
"center"
contentVerticalAlignment=
"center"
buttonType=
"roundedRect"
showsTouchWhenHighlighted=
"YES"
lineBreakMode=
"middleTruncation"
translatesAutoresizingMaskIntoConstraints=
"NO"
id=
"wUL-9N-u1V"
>
<rect
key=
"frame"
x=
"16"
y=
"597"
width=
"63.5"
height=
"30"
/>
<color
key=
"backgroundColor"
white=
"0.0"
alpha=
"1"
colorSpace=
"custom"
customColorSpace=
"genericGamma22GrayColorSpace"
/>
<state
key=
"normal"
title=
"Image"
>
...
...
@@ -64,7 +64,7 @@
<action
selector=
"selectImageAct:"
destination=
"BYZ-38-t0r"
eventType=
"touchUpInside"
id=
"5uR-SM-fKO"
/>
</connections>
</button>
<button
opaque=
"NO"
contentMode=
"scaleToFill"
ambiguous=
"YES"
contentHorizontalAlignment=
"center"
contentVerticalAlignment=
"center"
buttonType=
"roundedRect"
showsTouchWhenHighlighted=
"YES"
lineBreakMode=
"middleTruncation"
translatesAutoresizingMaskIntoConstraints=
"NO"
id=
"XpL-9M-UOp"
>
<button
opaque=
"NO"
contentMode=
"scaleToFill"
contentHorizontalAlignment=
"center"
contentVerticalAlignment=
"center"
buttonType=
"roundedRect"
showsTouchWhenHighlighted=
"YES"
lineBreakMode=
"middleTruncation"
translatesAutoresizingMaskIntoConstraints=
"NO"
id=
"XpL-9M-UOp"
>
<rect
key=
"frame"
x=
"109.5"
y=
"597"
width=
"63"
height=
"30"
/>
<color
key=
"backgroundColor"
white=
"0.0"
alpha=
"1"
colorSpace=
"custom"
customColorSpace=
"genericGamma22GrayColorSpace"
/>
<state
key=
"normal"
title=
"Load"
>
...
...
@@ -74,7 +74,7 @@
<action
selector=
"loadAct:"
destination=
"BYZ-38-t0r"
eventType=
"touchUpInside"
id=
"fZ5-CQ-jCY"
/>
</connections>
</button>
<button
opaque=
"NO"
contentMode=
"scaleToFill"
ambiguous=
"YES"
contentHorizontalAlignment=
"center"
contentVerticalAlignment=
"center"
buttonType=
"roundedRect"
showsTouchWhenHighlighted=
"YES"
lineBreakMode=
"middleTruncation"
translatesAutoresizingMaskIntoConstraints=
"NO"
id=
"R90-Yf-S6g"
>
<button
opaque=
"NO"
contentMode=
"scaleToFill"
contentHorizontalAlignment=
"center"
contentVerticalAlignment=
"center"
buttonType=
"roundedRect"
showsTouchWhenHighlighted=
"YES"
lineBreakMode=
"middleTruncation"
translatesAutoresizingMaskIntoConstraints=
"NO"
id=
"R90-Yf-S6g"
>
<rect
key=
"frame"
x=
"202.5"
y=
"597"
width=
"63.5"
height=
"30"
/>
<color
key=
"backgroundColor"
white=
"0.0"
alpha=
"1"
colorSpace=
"custom"
customColorSpace=
"genericGamma22GrayColorSpace"
/>
<state
key=
"normal"
title=
"Predict"
>
...
...
@@ -84,7 +84,7 @@
<action
selector=
"predictAct:"
destination=
"BYZ-38-t0r"
eventType=
"touchUpInside"
id=
"Iyy-sY-gt4"
/>
</connections>
</button>
<button
opaque=
"NO"
contentMode=
"scaleToFill"
ambiguous=
"YES"
contentHorizontalAlignment=
"center"
contentVerticalAlignment=
"center"
buttonType=
"roundedRect"
showsTouchWhenHighlighted=
"YES"
lineBreakMode=
"middleTruncation"
translatesAutoresizingMaskIntoConstraints=
"NO"
id=
"a3K-ri-NVs"
>
<button
opaque=
"NO"
contentMode=
"scaleToFill"
contentHorizontalAlignment=
"center"
contentVerticalAlignment=
"center"
buttonType=
"roundedRect"
showsTouchWhenHighlighted=
"YES"
lineBreakMode=
"middleTruncation"
translatesAutoresizingMaskIntoConstraints=
"NO"
id=
"a3K-ri-NVs"
>
<rect
key=
"frame"
x=
"296"
y=
"597"
width=
"63"
height=
"30"
/>
<color
key=
"backgroundColor"
white=
"0.0"
alpha=
"1"
colorSpace=
"custom"
customColorSpace=
"genericGamma22GrayColorSpace"
/>
<state
key=
"normal"
title=
"Clear"
>
...
...
@@ -94,7 +94,7 @@
<action
selector=
"clearAct:"
destination=
"BYZ-38-t0r"
eventType=
"touchUpInside"
id=
"JYf-UX-rCR"
/>
</connections>
</button>
<view
contentMode=
"scaleToFill"
ambiguous=
"YES"
translatesAutoresizingMaskIntoConstraints=
"NO"
id=
"w7H-Sk-Rai"
>
<view
contentMode=
"scaleToFill"
translatesAutoresizingMaskIntoConstraints=
"NO"
id=
"w7H-Sk-Rai"
>
<rect
key=
"frame"
x=
"79.5"
y=
"597"
width=
"30"
height=
"30"
/>
<color
key=
"backgroundColor"
white=
"1"
alpha=
"1"
colorSpace=
"custom"
customColorSpace=
"genericGamma22GrayColorSpace"
/>
<constraints>
...
...
@@ -102,7 +102,7 @@
<constraint
firstAttribute=
"width"
constant=
"30"
id=
"vYd-Fc-KAj"
/>
</constraints>
</view>
<view
contentMode=
"scaleToFill"
ambiguous=
"YES"
translatesAutoresizingMaskIntoConstraints=
"NO"
id=
"T4O-nx-ciH"
>
<view
contentMode=
"scaleToFill"
translatesAutoresizingMaskIntoConstraints=
"NO"
id=
"T4O-nx-ciH"
>
<rect
key=
"frame"
x=
"266"
y=
"597"
width=
"30"
height=
"30"
/>
<color
key=
"backgroundColor"
white=
"1"
alpha=
"1"
colorSpace=
"custom"
customColorSpace=
"genericGamma22GrayColorSpace"
/>
<constraints>
...
...
@@ -110,7 +110,7 @@
<constraint
firstAttribute=
"width"
constant=
"30"
id=
"fXE-S7-ZXL"
/>
</constraints>
</view>
<view
contentMode=
"scaleToFill"
ambiguous=
"YES"
translatesAutoresizingMaskIntoConstraints=
"NO"
id=
"976-fk-Kx2"
>
<view
contentMode=
"scaleToFill"
translatesAutoresizingMaskIntoConstraints=
"NO"
id=
"976-fk-Kx2"
>
<rect
key=
"frame"
x=
"172.5"
y=
"597"
width=
"30"
height=
"30"
/>
<color
key=
"backgroundColor"
white=
"1"
alpha=
"1"
colorSpace=
"custom"
customColorSpace=
"genericGamma22GrayColorSpace"
/>
<constraints>
...
...
@@ -118,7 +118,7 @@
<constraint
firstAttribute=
"width"
constant=
"30"
id=
"L4p-hP-s5C"
/>
</constraints>
</view>
<label
opaque=
"NO"
userInteractionEnabled=
"NO"
contentMode=
"left"
horizontalHuggingPriority=
"251"
verticalHuggingPriority=
"251"
ambiguous=
"YES"
text=
"耗时:"
lineBreakMode=
"tailTruncation"
numberOfLines=
"0"
baselineAdjustment=
"alignBaselines"
adjustsFontSizeToFit=
"NO"
translatesAutoresizingMaskIntoConstraints=
"NO"
id=
"m5L-O7-P31"
>
<label
opaque=
"NO"
userInteractionEnabled=
"NO"
contentMode=
"left"
horizontalHuggingPriority=
"251"
verticalHuggingPriority=
"251"
text=
"耗时:"
lineBreakMode=
"tailTruncation"
numberOfLines=
"0"
baselineAdjustment=
"alignBaselines"
adjustsFontSizeToFit=
"NO"
translatesAutoresizingMaskIntoConstraints=
"NO"
id=
"m5L-O7-P31"
>
<rect
key=
"frame"
x=
"15"
y=
"277"
width=
"350"
height=
"38"
/>
<constraints>
<constraint
firstAttribute=
"height"
constant=
"38"
id=
"6SS-sb-7I2"
/>
...
...
@@ -133,7 +133,7 @@
<constraint
firstAttribute=
"width"
secondItem=
"4ey-Xr-U4e"
secondAttribute=
"height"
multiplier=
"6.5:1"
id=
"8c5-FF-lB9"
/>
</constraints>
</imageView>
<textView
clipsSubviews=
"YES"
multipleTouchEnabled=
"YES"
contentMode=
"scaleToFill"
ambiguous=
"YES"
editable=
"NO"
text=
"结果:"
textAlignment=
"natural"
translatesAutoresizingMaskIntoConstraints=
"NO"
id=
"VQn-bS-fWp"
>
<textView
clipsSubviews=
"YES"
multipleTouchEnabled=
"YES"
contentMode=
"scaleToFill"
editable=
"NO"
text=
"结果:"
textAlignment=
"natural"
translatesAutoresizingMaskIntoConstraints=
"NO"
id=
"VQn-bS-fWp"
>
<rect
key=
"frame"
x=
"10"
y=
"323"
width=
"355"
height=
"70"
/>
<color
key=
"backgroundColor"
white=
"1"
alpha=
"1"
colorSpace=
"custom"
customColorSpace=
"genericGamma22GrayColorSpace"
/>
<constraints>
...
...
@@ -203,7 +203,6 @@
</scene>
</scenes>
<resources>
<image
name=
"hand.jpg"
width=
"564"
height=
"664"
/>
<image
name=
"paddle-mobile.png"
width=
"402"
height=
"62"
/>
</resources>
</document>
metal/paddle-mobile-demo/paddle-mobile-demo/Net/MobileNetSSD.swift
浏览文件 @
f9a87f61
...
...
@@ -33,7 +33,7 @@ class MobileNet_ssd_hand: Net{
return
"
\(
res
)
"
}
func
fetchResult
(
paddleMobileRes
:
ResultHolder
<
Float32
>
)
->
[
Float32
]
{
func
fetchResult
(
paddleMobileRes
:
ResultHolder
)
->
[
Float32
]
{
guard
let
interRes
=
paddleMobileRes
.
intermediateResults
else
{
fatalError
(
" need have inter result "
)
...
...
@@ -47,13 +47,17 @@ class MobileNet_ssd_hand: Net{
fatalError
()
}
var
scoreFormatArr
:
[
Float32
]
=
score
.
metalTexture
.
realNHWC
(
dim
:
(
n
:
score
.
originDim
[
0
],
h
:
score
.
originDim
[
1
],
w
:
score
.
originDim
[
2
],
c
:
score
.
originDim
[
3
]))
var
scoreFormatArr
:
[
Float32
]
=
score
.
metalTexture
.
realNHWC
(
dim
:
(
n
:
score
.
padToFourDim
[
0
],
h
:
score
.
padToFourDim
[
1
],
w
:
score
.
padToFourDim
[
2
],
c
:
score
.
padToFourDim
[
3
]))
print
(
"score: "
)
print
(
scoreFormatArr
.
strideArray
())
var
bboxArr
=
bbox
.
metalTexture
.
float32Array
()
print
(
"bbox: "
)
print
(
bboxArr
.
strideArray
())
let
nmsCompute
=
NMSCompute
.
init
()
nmsCompute
.
scoreThredshold
=
0.01
nmsCompute
.
nmsTopK
=
2
00
nmsCompute
.
nmsTopK
=
4
00
nmsCompute
.
keepTopK
=
200
nmsCompute
.
nmsEta
=
1.0
nmsCompute
.
nmsThreshold
=
0.45
...
...
@@ -68,6 +72,7 @@ class MobileNet_ssd_hand: Net{
let
output
:
[
Float32
]
=
result
.
map
{
$0
.
floatValue
}
return
output
}
...
...
metal/paddle-mobile-demo/paddle-mobile-demo/Net/Net.swift
浏览文件 @
f9a87f61
...
...
@@ -37,7 +37,7 @@ protocol Net {
var
preprocessKernel
:
CusomKernel
{
get
}
func
getTexture
(
image
:
CGImage
,
getTexture
:
@escaping
(
MTLTexture
)
->
Void
)
func
resultStr
(
res
:
[
Float
])
->
String
func
fetchResult
(
paddleMobileRes
:
ResultHolder
<
Float32
>
)
->
[
Float32
]
func
fetchResult
(
paddleMobileRes
:
ResultHolder
)
->
[
Float32
]
mutating
func
load
()
throws
func
predict
(
inTexture
:
MTLTexture
,
completion
:
@escaping
((
time
:
TimeInterval
,
resultArray
:
[
Float32
]))
->
Void
)
throws
...
...
@@ -82,7 +82,7 @@ extension Net {
}
}
func
fetchResult
(
paddleMobileRes
:
ResultHolder
<
Float32
>
)
->
[
Float32
]
{
func
fetchResult
(
paddleMobileRes
:
ResultHolder
)
->
[
Float32
]
{
return
paddleMobileRes
.
resultArr
}
...
...
metal/paddle-mobile-demo/paddle-mobile-demo/ViewController.swift
浏览文件 @
f9a87f61
...
...
@@ -19,17 +19,17 @@ import MetalPerformanceShaders
let
threadSupport
=
[
1
]
let
modelHelperMap
:
[
SupportModel
:
Net
]
=
[
.
mobilenet_ssd
:
MobileNet_ssd_hand
.
init
(),
.
genet
:
Genet
.
init
()]
let
modelHelperMap
:
[
SupportModel
:
Net
]
=
[
.
mobilenet
:
MobileNet
.
init
(),
.
mobilenet_ssd
:
MobileNet_ssd_hand
.
init
(),
.
genet
:
Genet
.
init
()]
//, .genet : Genet.init()
//let modelHelperMap: [SupportModel : Net] = [.mobilenet : MobileNet.init(), .mobilenet_ssd : MobileNet_ssd_hand.init()]
enum
SupportModel
:
String
{
//
case mobilenet = "mobilenet"
case
mobilenet
=
"mobilenet"
case
mobilenet_ssd
=
"mobilenetssd"
case
genet
=
"genet"
static
func
supportedModels
()
->
[
SupportModel
]
{
//
.mobilenet,
return
[
.
mobilenet_ssd
,
.
genet
]
//
return
[
.
mobilenet
,
.
mobilenet
_ssd
,
.
genet
]
}
}
...
...
@@ -87,7 +87,7 @@ class ViewController: UIViewController {
fatalError
()
}
// print(result.resultArray
)
print
(
result
.
resultArray
.
strideArray
()
)
if
i
==
max
-
1
{
let
time
=
Date
.
init
()
.
timeIntervalSince
(
startDate
)
DispatchQueue
.
main
.
async
{
...
...
metal/paddle-mobile/paddle-mobile.xcodeproj/project.pbxproj
浏览文件 @
f9a87f61
...
...
@@ -41,7 +41,6 @@
FC0E2DBE20EE460D009C1FAC
/* BatchNormKernel.swift in Sources */
=
{
isa
=
PBXBuildFile
;
fileRef
=
FC0E2DBD20EE460D009C1FAC
/* BatchNormKernel.swift */
;
};
FC0E2DC020EE461F009C1FAC
/* ElementwiseAddKernel.swift in Sources */
=
{
isa
=
PBXBuildFile
;
fileRef
=
FC0E2DBF20EE461F009C1FAC
/* ElementwiseAddKernel.swift */
;
};
FC1B16B320EC9A4F00678B91
/* Kernels.metal in Sources */
=
{
isa
=
PBXBuildFile
;
fileRef
=
FC1B16B220EC9A4F00678B91
/* Kernels.metal */
;
};
FC1B186620ECF1C600678B91
/* ResizeKernel.swift in Sources */
=
{
isa
=
PBXBuildFile
;
fileRef
=
FC1B186520ECF1C600678B91
/* ResizeKernel.swift */
;
};
FC3602CC2108819F00FACB58
/* PaddleMobileUnitTest.swift in Sources */
=
{
isa
=
PBXBuildFile
;
fileRef
=
FC3602CB2108819F00FACB58
/* PaddleMobileUnitTest.swift */
;
};
FC4CB74920F0B954007C0C6D
/* ConvKernel.metal in Sources */
=
{
isa
=
PBXBuildFile
;
fileRef
=
FC4CB74820F0B954007C0C6D
/* ConvKernel.metal */
;
};
FC4CB74B20F12C30007C0C6D
/* ProgramOptimize.swift in Sources */
=
{
isa
=
PBXBuildFile
;
fileRef
=
FC4CB74A20F12C30007C0C6D
/* ProgramOptimize.swift */
;
};
...
...
@@ -133,7 +132,6 @@
FC0E2DBD20EE460D009C1FAC
/* BatchNormKernel.swift */
=
{
isa
=
PBXFileReference
;
lastKnownFileType
=
sourcecode.swift
;
path
=
BatchNormKernel.swift
;
sourceTree
=
"<group>"
;
};
FC0E2DBF20EE461F009C1FAC
/* ElementwiseAddKernel.swift */
=
{
isa
=
PBXFileReference
;
lastKnownFileType
=
sourcecode.swift
;
path
=
ElementwiseAddKernel.swift
;
sourceTree
=
"<group>"
;
};
FC1B16B220EC9A4F00678B91
/* Kernels.metal */
=
{
isa
=
PBXFileReference
;
lastKnownFileType
=
sourcecode.metal
;
path
=
Kernels.metal
;
sourceTree
=
"<group>"
;
};
FC1B186520ECF1C600678B91
/* ResizeKernel.swift */
=
{
isa
=
PBXFileReference
;
lastKnownFileType
=
sourcecode.swift
;
path
=
ResizeKernel.swift
;
sourceTree
=
"<group>"
;
};
FC27990D21341016000B6BAD
/* BoxCoder.metal */
=
{
isa
=
PBXFileReference
;
fileEncoding
=
4
;
lastKnownFileType
=
sourcecode.metal
;
path
=
BoxCoder.metal
;
sourceTree
=
"<group>"
;
};
FC3602CB2108819F00FACB58
/* PaddleMobileUnitTest.swift */
=
{
isa
=
PBXFileReference
;
lastKnownFileType
=
sourcecode.swift
;
path
=
PaddleMobileUnitTest.swift
;
sourceTree
=
"<group>"
;
};
FC4CB74820F0B954007C0C6D
/* ConvKernel.metal */
=
{
isa
=
PBXFileReference
;
lastKnownFileType
=
sourcecode.metal
;
path
=
ConvKernel.metal
;
sourceTree
=
"<group>"
;
};
...
...
@@ -326,7 +324,6 @@
FCEB6837212F00B100D2448E
/* metal */
,
FCDDC6C7212FA3CA00E5EF74
/* ConvTransposeKernel.swift */
,
FC0E2DBB20EE45FE009C1FAC
/* ConvKernel.swift */
,
FC1B186520ECF1C600678B91
/* ResizeKernel.swift */
,
FC0E2DB920EE3B8D009C1FAC
/* ReluKernel.swift */
,
FC0E2DBD20EE460D009C1FAC
/* BatchNormKernel.swift */
,
FC0E2DBF20EE461F009C1FAC
/* ElementwiseAddKernel.swift */
,
...
...
@@ -506,7 +503,6 @@
FC039BBB20E11CC20081E9F8
/* ProgramDesc.swift in Sources */
,
FC9D037920E229E4000F735A
/* OpParam.swift in Sources */
,
FC3602CC2108819F00FACB58
/* PaddleMobileUnitTest.swift in Sources */
,
FC1B186620ECF1C600678B91
/* ResizeKernel.swift in Sources */
,
FCF2D73820E64E70007AC5F5
/* Kernel.swift in Sources */
,
FCDDC6CC212FDFDB00E5EF74
/* ReluKernel.metal in Sources */
,
FC0226562138F33800F395E2
/* TransposeKernel.metal in Sources */
,
...
...
metal/paddle-mobile/paddle-mobile/Common/MetalExtension.swift
浏览文件 @
f9a87f61
...
...
@@ -113,7 +113,7 @@ extension MTLDevice {
return
tensor
}
func
tensor2texture
<
P
>
(
value
:
[
P
],
dim
:
[
Int
],
transpose
:
[
Int
]
=
[
0
,
1
,
2
,
3
])
->
MTLTexture
{
func
tensor2texture
<
P
>
(
value
:
[
P
],
dim
:
[
Int
],
transpose
:
[
Int
]
=
[
0
,
1
,
2
,
3
]
,
inComputePrecision
:
ComputePrecision
=
.
Float32
)
->
MTLTexture
{
if
value
.
count
>
0
{
assert
(
value
.
count
==
dim
.
reduce
(
1
)
{
$0
*
$1
})
}
...
...
@@ -129,7 +129,13 @@ extension MTLDevice {
textureDesc
.
height
=
ndim
[
1
]
textureDesc
.
depth
=
1
textureDesc
.
usage
=
[
.
shaderRead
,
.
shaderWrite
]
textureDesc
.
pixelFormat
=
.
rgba32Float
if
inComputePrecision
==
.
Float16
{
textureDesc
.
pixelFormat
=
.
rgba16Float
}
else
if
inComputePrecision
==
.
Float32
{
textureDesc
.
pixelFormat
=
.
rgba32Float
}
textureDesc
.
textureType
=
.
type2DArray
textureDesc
.
storageMode
=
.
shared
textureDesc
.
cpuCacheMode
=
.
defaultCache
...
...
@@ -354,13 +360,8 @@ public extension MTLTexture {
}
// n c h w - dim
func
toTensor
(
dim
:
(
n
:
Int
,
c
:
Int
,
h
:
Int
,
w
:
Int
),
texturePrecision
:
ComputePrecision
=
.
Float16
)
->
[
Float32
]
{
// print("origin dim: \(dim)")
print
(
"texture: "
)
print
(
self
)
func
toTensor
(
dim
:
(
n
:
Int
,
c
:
Int
,
h
:
Int
,
w
:
Int
))
->
[
Float32
]
{
var
textureArray
:
[
Float32
]
// if texturePrecision == .Float16
if
pixelFormat
==
.
rgba32Float
{
textureArray
=
floatArray
{
(
i
:
Float32
)
->
Float32
in
return
i
...
...
@@ -388,11 +389,10 @@ public extension MTLTexture {
}
}
}
print
(
" tensor count --
\(
output
.
count
)
"
)
return
output
}
func
realNHWC
(
dim
:
(
n
:
Int
,
h
:
Int
,
w
:
Int
,
c
:
Int
)
,
texturePrecision
:
ComputePrecision
=
.
Float16
)
->
[
Float32
]
{
func
realNHWC
(
dim
:
(
n
:
Int
,
h
:
Int
,
w
:
Int
,
c
:
Int
))
->
[
Float32
]
{
// print("origin dim: \(dim)")
// print("texture: ")
// print(self)
...
...
metal/paddle-mobile/paddle-mobile/Executor.swift
浏览文件 @
f9a87f61
...
...
@@ -14,18 +14,18 @@
import
Foundation
let
testTo
=
61
let
testTo
=
1
61
var
isTest
=
false
let
computePrecision
:
ComputePrecision
=
.
Float
32
let
computePrecision
:
ComputePrecision
=
.
Float
16
public
class
ResultHolder
<
P
:
PrecisionType
>
{
public
class
ResultHolder
{
public
let
dim
:
[
Int
]
public
let
resultArr
:
[
P
]
public
let
resultArr
:
[
Float32
]
public
var
intermediateResults
:
[
String
:
[
Variant
]]?
public
let
elapsedTime
:
Double
public
init
(
inDim
:
[
Int
],
inResult
:
[
P
],
inElapsedTime
:
Double
,
inIntermediateResults
:
[
String
:
[
Variant
]]?
=
nil
)
{
public
init
(
inDim
:
[
Int
],
inResult
:
[
Float32
],
inElapsedTime
:
Double
,
inIntermediateResults
:
[
String
:
[
Variant
]]?
=
nil
)
{
dim
=
inDim
resultArr
=
inResult
elapsedTime
=
inElapsedTime
...
...
@@ -78,7 +78,7 @@ public class Executor<P: PrecisionType> {
}
}
public
func
predict
(
input
:
MTLTexture
,
dim
:
[
Int
],
completionHandle
:
@escaping
(
ResultHolder
<
P
>
)
->
Void
,
preProcessKernle
:
CusomKernel
?
=
nil
,
except
:
Int
=
0
)
throws
{
public
func
predict
(
input
:
MTLTexture
,
dim
:
[
Int
],
completionHandle
:
@escaping
(
ResultHolder
)
->
Void
,
preProcessKernle
:
CusomKernel
?
=
nil
,
except
:
Int
=
0
)
throws
{
guard
let
buffer
=
queue
.
makeCommandBuffer
()
else
{
throw
PaddleMobileError
.
predictError
(
message
:
"CommandBuffer is nil"
)
}
...
...
@@ -114,12 +114,10 @@ public class Executor<P: PrecisionType> {
buffer
.
addCompletedHandler
{
(
commandbuffer
)
in
// let inputArr = resInput.floatArray(res: { (p:P) -> P in
// return p
// })
// print(inputArr.strideArray())
// let inputArr = resInput.toTensor(dim: (n: dim[0], c: dim[3], h: dim[1], w: dim[2]))
//// print(inputArr.strideArray())
//
// writeToLibrary(fileName: "
genet_input_han
d", array: inputArr)
// writeToLibrary(fileName: "
test_image_ss
d", array: inputArr)
// print("write to library done")
// return
// print(inputArr)
...
...
@@ -133,23 +131,23 @@ public class Executor<P: PrecisionType> {
print
(
" 第
\(
i
)
个 op: "
)
op
.
delogOutput
()
}
// self.ops[59].delogOutput()
// return;
// self.ops[testTo - 2].delogOutput()
// self.ops[testTo - 1].delogOutput()
// self.ops[60].delogOutput()
return
//
return
let
afterDate
=
Date
.
init
()
var
resultHolder
:
ResultHolder
<
P
>
var
resultHolder
:
ResultHolder
if
except
>
0
{
resultHolder
=
ResultHolder
<
P
>
.
init
(
inDim
:
[],
inResult
:
[],
inElapsedTime
:
afterDate
.
timeIntervalSince
(
beforeDate
),
inIntermediateResults
:
outputTextures
)
resultHolder
=
ResultHolder
.
init
(
inDim
:
[],
inResult
:
[],
inElapsedTime
:
afterDate
.
timeIntervalSince
(
beforeDate
),
inIntermediateResults
:
outputTextures
)
}
else
{
let
outputVar
:
Variant
=
self
.
program
.
scope
.
output
()
!
let
output
:
Texture
<
P
>
=
outputVar
as!
Texture
<
P
>
resultHolder
=
ResultHolder
<
P
>.
init
(
inDim
:
output
.
dim
.
dims
,
inResult
:
output
.
metalTexture
.
floatArray
(
res
:
{
(
p
:
P
)
->
P
in
return
p
}),
inElapsedTime
:
afterDate
.
timeIntervalSince
(
beforeDate
))
resultHolder
=
ResultHolder
.
init
(
inDim
:
output
.
dim
.
dims
,
inResult
:
output
.
toTensor
(),
inElapsedTime
:
afterDate
.
timeIntervalSince
(
beforeDate
))
}
completionHandle
(
resultHolder
)
...
...
metal/paddle-mobile/paddle-mobile/Loader.swift
浏览文件 @
f9a87f61
...
...
@@ -159,7 +159,7 @@ public class Loader<P: PrecisionType> {
}
catch
let
error
{
throw
error
}
tensor
.
convert
(
to
:
DataLayout
.
NHWC
())
//
tensor.convert(to: DataLayout.NHWC())
// tensor.initBuffer(device: device)
scope
[
varDesc
.
name
]
=
tensor
}
else
{
...
...
@@ -168,7 +168,7 @@ public class Loader<P: PrecisionType> {
}
}
else
{
if
varDesc
.
name
==
fetchKey
{
scope
[
varDesc
.
name
]
=
ResultHolder
<
P
>
.
init
(
inDim
:
[],
inResult
:
[],
inElapsedTime
:
0.0
)
scope
[
varDesc
.
name
]
=
ResultHolder
.
init
(
inDim
:
[],
inResult
:
[],
inElapsedTime
:
0.0
)
}
else
if
varDesc
.
name
==
feedKey
{
}
}
...
...
metal/paddle-mobile/paddle-mobile/Operators/BoxcoderOp.swift
浏览文件 @
f9a87f61
...
...
@@ -59,28 +59,28 @@ class BoxcoderOp<P: PrecisionType>: Operator<BoxcoderKernel<P>, BoxcoderParam<P>
func
delogOutput
()
{
print
(
"
\(
type
)
output: "
)
// let priorBox
OriginDim = para.priorBox.origin
Dim
// let priorBoxArray: [Float32] = para.priorBox.metalTexture.realNHWC(dim: (n: priorBox
OriginDim[0], h: priorBoxOriginDim[1], w: priorBoxOriginDim[2], c: priorBoxOrigin
Dim[3]))
// let priorBox
padToFourDim = para.priorBox.padToFour
Dim
// let priorBoxArray: [Float32] = para.priorBox.metalTexture.realNHWC(dim: (n: priorBox
padToFourDim[0], h: priorBoxpadToFourDim[1], w: priorBoxpadToFourDim[2], c: priorBoxpadToFour
Dim[3]))
// print(" prior box ")
// print(priorBoxArray.strideArray())
//
// let priorBoxVar
OriginDim = para.priorBoxVar.origin
Dim
// let priorBoxVarArray: [Float32] = para.priorBoxVar.metalTexture.realNHWC(dim: (n: priorBoxVar
OriginDim[0], h: priorBoxVarOriginDim[1], w: priorBoxVarOriginDim[2], c: priorBoxVarOrigin
Dim[3]))
// let priorBoxVar
padToFourDim = para.priorBoxVar.padToFour
Dim
// let priorBoxVarArray: [Float32] = para.priorBoxVar.metalTexture.realNHWC(dim: (n: priorBoxVar
padToFourDim[0], h: priorBoxVarpadToFourDim[1], w: priorBoxVarpadToFourDim[2], c: priorBoxVarpadToFour
Dim[3]))
// print(" prior box var ")
// print(priorBoxVarArray.strideArray())
//
// let targetBox
OriginDim = para.targetBox.origin
Dim
// let targetBoxArray: [Float32] = para.targetBox.metalTexture.realNHWC(dim: (n: targetBox
OriginDim[0], h: targetBoxOriginDim[1], w: targetBoxOriginDim[2], c: targetBoxOrigin
Dim[3]))
// let targetBox
padToFourDim = para.targetBox.padToFour
Dim
// let targetBoxArray: [Float32] = para.targetBox.metalTexture.realNHWC(dim: (n: targetBox
padToFourDim[0], h: targetBoxpadToFourDim[1], w: targetBoxpadToFourDim[2], c: targetBoxpadToFour
Dim[3]))
// print(" target box ")
// print(targetBoxArray.strideArray())
let
targetBox
OriginDim
=
para
.
targetBox
.
origin
Dim
let
targetBoxArray
=
para
.
targetBox
.
metalTexture
.
realNHWC
(
dim
:
(
n
:
targetBox
OriginDim
[
0
],
h
:
targetBoxOriginDim
[
1
],
w
:
targetBoxOriginDim
[
2
],
c
:
targetBoxOriginDim
[
3
]),
texturePrecision
:
computePrecision
)
let
targetBox
padToFourDim
=
para
.
targetBox
.
padToFour
Dim
let
targetBoxArray
=
para
.
targetBox
.
metalTexture
.
realNHWC
(
dim
:
(
n
:
targetBox
padToFourDim
[
0
],
h
:
targetBoxpadToFourDim
[
1
],
w
:
targetBoxpadToFourDim
[
2
],
c
:
targetBoxpadToFourDim
[
3
])
)
print
(
" target box "
)
print
(
targetBoxArray
.
strideArray
())
let
originDim
=
para
.
output
.
origin
Dim
let
outputArray
:
[
Float32
]
=
para
.
output
.
metalTexture
.
realNHWC
(
dim
:
(
n
:
originDim
[
0
],
h
:
originDim
[
1
],
w
:
originDim
[
2
],
c
:
originDim
[
3
]),
texturePrecision
:
computePrecision
)
let
padToFourDim
=
para
.
output
.
padToFour
Dim
let
outputArray
:
[
Float32
]
=
para
.
output
.
metalTexture
.
realNHWC
(
dim
:
(
n
:
padToFourDim
[
0
],
h
:
padToFourDim
[
1
],
w
:
padToFourDim
[
2
],
c
:
padToFourDim
[
3
])
)
print
(
" output "
)
print
(
outputArray
.
strideArray
())
}
...
...
metal/paddle-mobile/paddle-mobile/Operators/ConcatOp.swift
浏览文件 @
f9a87f61
...
...
@@ -65,12 +65,12 @@ class ConcatOp<P: PrecisionType>: Operator<ConcatKernel<P>, ConcatParam<P>>, Run
func
delogOutput
()
{
print
(
"
\(
type
)
output: "
)
let
originDim
=
para
.
output
.
origin
Dim
let
padToFourDim
=
para
.
output
.
padToFour
Dim
if
para
.
output
.
transpose
==
[
0
,
1
,
2
,
3
]
{
let
outputArray
:
[
Float32
]
=
para
.
output
.
metalTexture
.
realNHWC
(
dim
:
(
n
:
originDim
[
0
],
h
:
originDim
[
1
],
w
:
originDim
[
2
],
c
:
originDim
[
3
]),
texturePrecision
:
computePrecision
)
let
outputArray
:
[
Float32
]
=
para
.
output
.
metalTexture
.
realNHWC
(
dim
:
(
n
:
padToFourDim
[
0
],
h
:
padToFourDim
[
1
],
w
:
padToFourDim
[
2
],
c
:
padToFourDim
[
3
])
)
print
(
outputArray
.
strideArray
())
}
else
if
para
.
output
.
transpose
==
[
0
,
2
,
3
,
1
]
{
print
(
para
.
output
.
metalTexture
.
toTensor
(
dim
:
(
n
:
originDim
[
0
],
c
:
originDim
[
1
],
h
:
originDim
[
2
],
w
:
originDim
[
3
]),
texturePrecision
:
computePrecision
)
.
strideArray
())
print
(
para
.
output
.
metalTexture
.
toTensor
(
dim
:
(
n
:
padToFourDim
[
0
],
c
:
padToFourDim
[
1
],
h
:
padToFourDim
[
2
],
w
:
padToFourDim
[
3
])
)
.
strideArray
())
}
else
{
fatalError
(
" not implemet"
)
}
...
...
metal/paddle-mobile/paddle-mobile/Operators/ConvAddBatchNormReluOp.swift
浏览文件 @
f9a87f61
...
...
@@ -34,7 +34,7 @@ class ConvAddBatchNormReluParam<P: PrecisionType>: OpParam {
scale
=
try
ConvAddBatchNormReluParam
.
inputScale
(
inputs
:
opDesc
.
paraInputs
,
from
:
inScope
)
mean
=
try
ConvAddBatchNormReluParam
.
inputMean
(
inputs
:
opDesc
.
paraInputs
,
from
:
inScope
)
y
=
try
ConvAddBatchNormReluParam
.
inputY
(
inputs
:
opDesc
.
i
nputs
,
from
:
inScope
)
y
=
try
ConvAddBatchNormReluParam
.
inputY
(
inputs
:
opDesc
.
paraI
nputs
,
from
:
inScope
)
}
catch
let
error
{
throw
error
}
...
...
@@ -112,7 +112,7 @@ class ConvAddBatchNormReluOp<P: PrecisionType>: Operator<ConvAddBatchNormReluKer
func
delogOutput
()
{
print
(
" conv add batchnorm relu output "
)
print
(
para
.
output
.
metalTexture
.
toTensor
(
dim
:
(
n
:
para
.
output
.
originDim
[
0
],
c
:
para
.
output
.
originDim
[
1
],
h
:
para
.
output
.
originDim
[
2
],
w
:
para
.
output
.
originDim
[
3
])
)
.
strideArray
())
print
(
para
.
output
.
toTensor
(
)
.
strideArray
())
// let _: P? = para.input.metalTexture.logDesc(header: "conv add batchnorm relu input: ", stridable: false)
// para.filter.logDataPointer(header: "filter data pointer: ")
// print("filter: \(para.filter)")
...
...
metal/paddle-mobile/paddle-mobile/Operators/ConvAddOp.swift
浏览文件 @
f9a87f61
...
...
@@ -93,24 +93,24 @@ class ConvAddOp<P: PrecisionType>: Operator<ConvAddKernel<P>, ConvAddParam<P>>,
}
func
delogOutput
()
{
// print("op \(type): ")
// print(" padding: ")
// print(para.paddings)
// print("stride: ")
// print(para.stride)
// print("dilations: ")
// print(para.dilations)
// print(" para input dim: ")
// print(para.input.dim)
// print(" para filter dim: ")
// print(para.filter.dim)
// print(" para output dim: ")
// print(para.output.dim)
// print(" biase: ")
// let biase: [Float32] = para.y.buffer.array()
// print(biase)
print
(
" padding: "
)
print
(
para
.
paddings
)
print
(
"stride: "
)
print
(
para
.
stride
)
print
(
"dilations: "
)
print
(
para
.
dilations
)
print
(
"
\(
type
)
output: "
)
print
(
" para input dim: "
)
print
(
para
.
input
.
dim
)
print
(
" para filter dim: "
)
print
(
para
.
filter
.
dim
)
print
(
" para output dim: "
)
print
(
para
.
output
.
dim
)
print
(
" biase: "
)
let
biase
:
[
Float32
]
=
para
.
y
.
buffer
.
array
()
print
(
biase
)
print
(
para
.
output
.
metalTexture
.
toTensor
(
dim
:
(
n
:
para
.
output
.
tensorDim
[
0
],
c
:
para
.
output
.
tensorDim
[
1
],
h
:
para
.
output
.
tensorDim
[
2
],
w
:
para
.
output
.
tensorDim
[
3
]),
texturePrecision
:
computePrecision
)
.
strideArray
())
print
(
para
.
output
.
metalTexture
.
toTensor
(
dim
:
(
n
:
para
.
output
.
tensorDim
[
0
],
c
:
para
.
output
.
tensorDim
[
1
],
h
:
para
.
output
.
tensorDim
[
2
],
w
:
para
.
output
.
tensorDim
[
3
]))
.
strideArray
())
}
}
metal/paddle-mobile/paddle-mobile/Operators/ConvBNReluOp.swift
浏览文件 @
f9a87f61
...
...
@@ -110,7 +110,7 @@ class ConvBNReluOp<P: PrecisionType>: Operator<ConvBNReluKernel<P>, ConvBNReluPa
func
delogOutput
()
{
print
(
"
\(
type
)
output: "
)
print
(
para
.
output
.
metalTexture
.
toTensor
(
dim
:
(
n
:
para
.
output
.
originDim
[
0
],
c
:
para
.
output
.
originDim
[
1
],
h
:
para
.
output
.
originDim
[
2
],
w
:
para
.
output
.
originDim
[
3
]),
texturePrecision
:
computePrecision
)
.
strideArray
())
print
(
para
.
output
.
metalTexture
.
toTensor
(
dim
:
(
n
:
para
.
output
.
padToFourDim
[
0
],
c
:
para
.
output
.
padToFourDim
[
1
],
h
:
para
.
output
.
padToFourDim
[
2
],
w
:
para
.
output
.
padToFourDim
[
3
])
)
.
strideArray
())
}
}
metal/paddle-mobile/paddle-mobile/Operators/ConvOp.swift
浏览文件 @
f9a87f61
...
...
@@ -75,7 +75,7 @@ class ConvOp<P: PrecisionType>: Operator<ConvKernel<P>, ConvParam<P>>, Runable,
func
delogOutput
()
{
print
(
"conv output : "
)
print
(
para
.
output
.
metalTexture
.
toTensor
(
dim
:
(
n
:
para
.
output
.
originDim
[
0
],
c
:
para
.
output
.
originDim
[
1
],
h
:
para
.
output
.
originDim
[
2
],
w
:
para
.
output
.
originDim
[
3
])
)
.
strideArray
())
print
(
para
.
output
.
toTensor
(
)
.
strideArray
())
// let _: Float16? = para.output.metalTexture.logDesc()
}
}
metal/paddle-mobile/paddle-mobile/Operators/ConvTransposeOp.swift
浏览文件 @
f9a87f61
...
...
@@ -43,13 +43,15 @@ class ConvTransposeOp<P: PrecisionType>: Operator<ConvTransposeKernel<P>, ConvTr
}
func
delogOutput
()
{
print
(
"
\(
type
)
output: "
)
let
originDim
=
para
.
output
.
origin
Dim
let
padToFourDim
=
para
.
output
.
padToFour
Dim
if
para
.
output
.
transpose
==
[
0
,
1
,
2
,
3
]
{
let
outputArray
:
[
Float32
]
=
para
.
output
.
metalTexture
.
realNHWC
(
dim
:
(
n
:
originDim
[
0
],
h
:
originDim
[
1
],
w
:
originDim
[
2
],
c
:
originDim
[
3
]),
texturePrecision
:
computePrecision
)
let
outputArray
:
[
Float32
]
=
para
.
output
.
metalTexture
.
realNHWC
(
dim
:
(
n
:
padToFourDim
[
0
],
h
:
padToFourDim
[
1
],
w
:
padToFourDim
[
2
],
c
:
padToFourDim
[
3
])
)
print
(
outputArray
.
strideArray
())
}
else
if
para
.
output
.
transpose
==
[
0
,
2
,
3
,
1
]
{
print
(
para
.
output
.
metalTexture
.
toTensor
(
dim
:
(
n
:
para
.
output
.
tensorDim
[
0
],
c
:
para
.
output
.
tensorDim
[
1
],
h
:
para
.
output
.
tensorDim
[
2
],
w
:
para
.
output
.
tensorDim
[
3
]),
texturePrecision
:
computePrecision
)
.
strideArray
())
let
output
=
para
.
output
.
metalTexture
.
toTensor
(
dim
:
(
n
:
para
.
output
.
tensorDim
[
0
],
c
:
para
.
output
.
tensorDim
[
1
],
h
:
para
.
output
.
tensorDim
[
2
],
w
:
para
.
output
.
tensorDim
[
3
]))
print
(
output
.
strideArray
())
}
else
{
print
(
" not implement"
)
}
...
...
metal/paddle-mobile/paddle-mobile/Operators/DepthwiseConvOp.swift
浏览文件 @
f9a87f61
...
...
@@ -58,6 +58,6 @@ class DepthConvOp<P: PrecisionType>: Operator<ConvKernel<P>, ConvParam<P>>, Runa
func
delogOutput
()
{
print
(
"
\(
type
)
output: "
)
print
(
para
.
output
.
metalTexture
.
toTensor
(
dim
:
(
n
:
para
.
output
.
originDim
[
0
],
c
:
para
.
output
.
originDim
[
1
],
h
:
para
.
output
.
originDim
[
2
],
w
:
para
.
output
.
originDim
[
3
]),
texturePrecision
:
computePrecision
)
.
strideArray
())
print
(
para
.
output
.
metalTexture
.
toTensor
(
dim
:
(
n
:
para
.
output
.
padToFourDim
[
0
],
c
:
para
.
output
.
padToFourDim
[
1
],
h
:
para
.
output
.
padToFourDim
[
2
],
w
:
para
.
output
.
padToFourDim
[
3
])
)
.
strideArray
())
}
}
metal/paddle-mobile/paddle-mobile/Operators/DwConvBNReluOp.swift
浏览文件 @
f9a87f61
...
...
@@ -65,6 +65,6 @@ class DwConvBNReluOp<P: PrecisionType>: Operator<ConvBNReluKernel<P>, ConvBNRelu
func
delogOutput
()
{
print
(
"
\(
type
)
output: "
)
print
(
para
.
output
.
metalTexture
.
toTensor
(
dim
:
(
n
:
para
.
output
.
originDim
[
0
],
c
:
para
.
output
.
originDim
[
1
],
h
:
para
.
output
.
originDim
[
2
],
w
:
para
.
output
.
originDim
[
3
]),
texturePrecision
:
computePrecision
)
.
strideArray
())
print
(
para
.
output
.
metalTexture
.
toTensor
(
dim
:
(
n
:
para
.
output
.
padToFourDim
[
0
],
c
:
para
.
output
.
padToFourDim
[
1
],
h
:
para
.
output
.
padToFourDim
[
2
],
w
:
para
.
output
.
padToFourDim
[
3
])
)
.
strideArray
())
}
}
metal/paddle-mobile/paddle-mobile/Operators/ElementwiseAddOp.swift
浏览文件 @
f9a87f61
...
...
@@ -71,12 +71,15 @@ class ElementwiseAddOp<P: PrecisionType>: Operator<ElementwiseAddKernel<P>, Elem
// print(para.inputY.metalTexture.toTensor(dim: (n: para.inputY.tensorDim[0], c: para.inputY.tensorDim[1], h: para.inputY.tensorDim[2], w: para.inputY.tensorDim[3])).strideArray())
print
(
"
\(
type
)
output: "
)
let
originDim
=
para
.
output
.
originDim
print
(
para
.
inputY
)
let
padToFourDim
=
para
.
output
.
padToFourDim
if
para
.
output
.
transpose
==
[
0
,
1
,
2
,
3
]
{
let
outputArray
:
[
Float32
]
=
para
.
output
.
metalTexture
.
realNHWC
(
dim
:
(
n
:
originDim
[
0
],
h
:
originDim
[
1
],
w
:
originDim
[
2
],
c
:
originDim
[
3
]),
texturePrecision
:
computePrecision
)
let
outputArray
:
[
Float32
]
=
para
.
output
.
metalTexture
.
realNHWC
(
dim
:
(
n
:
padToFourDim
[
0
],
h
:
padToFourDim
[
1
],
w
:
padToFourDim
[
2
],
c
:
padToFourDim
[
3
])
)
print
(
outputArray
.
strideArray
())
}
else
if
para
.
output
.
transpose
==
[
0
,
2
,
3
,
1
]
{
print
(
para
.
output
.
metalTexture
.
toTensor
(
dim
:
(
n
:
para
.
output
.
tensorDim
[
0
],
c
:
para
.
output
.
tensorDim
[
1
],
h
:
para
.
output
.
tensorDim
[
2
],
w
:
para
.
output
.
tensorDim
[
3
])
,
texturePrecision
:
computePrecision
)
.
strideArray
())
print
(
para
.
output
.
metalTexture
.
toTensor
(
dim
:
(
n
:
para
.
output
.
tensorDim
[
0
],
c
:
para
.
output
.
tensorDim
[
1
],
h
:
para
.
output
.
tensorDim
[
2
],
w
:
para
.
output
.
tensorDim
[
3
]))
.
strideArray
())
}
else
{
print
(
" not implement"
)
}
...
...
metal/paddle-mobile/paddle-mobile/Operators/FeedOp.swift
浏览文件 @
f9a87f61
...
...
@@ -61,7 +61,7 @@ class FeedOp<P: PrecisionType>: Operator<Texture2DTo2DArrayKernel<P>, FeedParam<
func
delogOutput
()
{
print
(
"
\(
type
)
output: "
)
print
(
para
.
output
.
metalTexture
.
toTensor
(
dim
:
(
n
:
para
.
output
.
originDim
[
0
],
c
:
para
.
output
.
originDim
[
1
],
h
:
para
.
output
.
originDim
[
2
],
w
:
para
.
output
.
originDim
[
3
]),
texturePrecision
:
computePrecision
)
.
strideArray
())
print
(
para
.
output
.
metalTexture
.
toTensor
(
dim
:
(
n
:
para
.
output
.
padToFourDim
[
0
],
c
:
para
.
output
.
padToFourDim
[
1
],
h
:
para
.
output
.
padToFourDim
[
2
],
w
:
para
.
output
.
padToFourDim
[
3
])
)
.
strideArray
())
}
}
metal/paddle-mobile/paddle-mobile/Operators/Kernels/BatchNormKernel.swift
浏览文件 @
f9a87f61
...
...
@@ -15,53 +15,60 @@
import
Foundation
class
BatchNormKernel
<
P
:
PrecisionType
>
:
Kernel
,
Computable
{
var
newScale
:
MTLBuffer
var
newBias
:
MTLBuffer
var
newScale
:
MTLBuffer
var
newBias
:
MTLBuffer
required
init
(
device
:
MTLDevice
,
param
:
BatchNormParam
<
P
>
)
{
guard
let
newScale
=
device
.
makeBuffer
(
length
:
param
.
inputScale
.
buffer
.
length
)
else
{
fatalError
()
}
guard
let
newBias
=
device
.
makeBuffer
(
length
:
param
.
inputBias
.
buffer
.
length
)
else
{
fatalError
()
}
self
.
newScale
=
newScale
self
.
newBias
=
newBias
if
computePrecision
==
.
Float32
{
super
.
init
(
device
:
device
,
inFunctionName
:
"batchnorm"
)
}
else
if
computePrecision
==
.
Float16
{
super
.
init
(
device
:
device
,
inFunctionName
:
"batchnorm_half"
)
}
else
{
fatalError
()
}
let
varianceBuffer
:
MTLBuffer
=
param
.
inputVariance
.
buffer
required
init
(
device
:
MTLDevice
,
param
:
BatchNormParam
<
P
>
)
{
guard
let
newScale
=
device
.
makeBuffer
(
length
:
param
.
inputScale
.
buffer
.
length
)
else
{
fatalError
()
}
guard
let
newBias
=
device
.
makeBuffer
(
length
:
param
.
inputBias
.
buffer
.
length
)
else
{
fatalError
()
}
self
.
newScale
=
newScale
self
.
newBias
=
newBias
super
.
init
(
device
:
device
,
inFunctionName
:
"batchnorm"
)
let
varianceBuffer
:
MTLBuffer
=
param
.
inputVariance
.
buffer
var
invStd
:
[
Float32
]
=
Array
(
repeating
:
0
,
count
:
varianceBuffer
.
length
)
let
varianceContents
=
varianceBuffer
.
contents
()
.
assumingMemoryBound
(
to
:
P
.
self
)
for
i
in
0
..<
(
varianceBuffer
.
length
/
MemoryLayout
<
P
>.
stride
)
{
invStd
[
i
]
=
1
/
(
Float32
(
varianceContents
[
i
])
+
param
.
epsilon
)
.
squareRoot
()
}
let
newScaleContents
=
newScale
.
contents
()
.
assumingMemoryBound
(
to
:
P
.
self
)
let
newBiasContents
=
newBias
.
contents
()
.
assumingMemoryBound
(
to
:
P
.
self
)
let
scale
:
MTLBuffer
=
param
.
inputScale
.
buffer
let
scaleContents
=
scale
.
contents
()
.
assumingMemoryBound
(
to
:
P
.
self
)
let
bias
:
MTLBuffer
=
param
.
inputBias
.
buffer
let
biasContents
=
bias
.
contents
()
.
assumingMemoryBound
(
to
:
P
.
self
)
let
meanContents
=
param
.
inputMean
.
buffer
.
contents
()
.
assumingMemoryBound
(
to
:
P
.
self
)
for
i
in
0
..<
(
newScale
.
length
/
MemoryLayout
<
P
>.
stride
)
{
newScaleContents
[
i
]
=
P
(
invStd
[
i
]
*
Float32
(
scaleContents
[
i
]))
newBiasContents
[
i
]
=
P
(
Float32
(
biasContents
[
i
])
-
Float32
(
meanContents
[
i
])
*
invStd
[
i
]
*
Float32
(
scaleContents
[
i
]))
}
var
invStd
:
[
Float32
]
=
Array
(
repeating
:
0
,
count
:
varianceBuffer
.
length
)
let
varianceContents
=
varianceBuffer
.
contents
()
.
assumingMemoryBound
(
to
:
P
.
self
)
for
i
in
0
..<
(
varianceBuffer
.
length
/
MemoryLayout
<
P
>.
stride
)
{
invStd
[
i
]
=
1
/
(
Float32
(
varianceContents
[
i
])
+
param
.
epsilon
)
.
squareRoot
()
}
func
compute
(
commandBuffer
:
MTLCommandBuffer
,
param
:
BatchNormParam
<
P
>
)
throws
{
guard
let
encoder
=
commandBuffer
.
makeComputeCommandEncoder
()
else
{
throw
PaddleMobileError
.
predictError
(
message
:
" encoder is nil"
)
}
print
(
"BatchNorm compute"
)
encoder
.
setTexture
(
param
.
input
.
metalTexture
,
index
:
0
)
encoder
.
setTexture
(
param
.
output
.
metalTexture
,
index
:
1
)
encoder
.
setBuffer
(
newScale
,
offset
:
0
,
index
:
0
)
encoder
.
setBuffer
(
newBias
,
offset
:
0
,
index
:
1
)
encoder
.
dispatch
(
computePipline
:
pipline
,
outTexture
:
param
.
output
.
metalTexture
)
encoder
.
endEncoding
()
let
newScaleContents
=
newScale
.
contents
()
.
assumingMemoryBound
(
to
:
P
.
self
)
let
newBiasContents
=
newBias
.
contents
()
.
assumingMemoryBound
(
to
:
P
.
self
)
let
scale
:
MTLBuffer
=
param
.
inputScale
.
buffer
let
scaleContents
=
scale
.
contents
()
.
assumingMemoryBound
(
to
:
P
.
self
)
let
bias
:
MTLBuffer
=
param
.
inputBias
.
buffer
let
biasContents
=
bias
.
contents
()
.
assumingMemoryBound
(
to
:
P
.
self
)
let
meanContents
=
param
.
inputMean
.
buffer
.
contents
()
.
assumingMemoryBound
(
to
:
P
.
self
)
for
i
in
0
..<
(
newScale
.
length
/
MemoryLayout
<
P
>.
stride
)
{
newScaleContents
[
i
]
=
P
(
invStd
[
i
]
*
Float32
(
scaleContents
[
i
]))
newBiasContents
[
i
]
=
P
(
Float32
(
biasContents
[
i
])
-
Float32
(
meanContents
[
i
])
*
invStd
[
i
]
*
Float32
(
scaleContents
[
i
]))
}
}
func
compute
(
commandBuffer
:
MTLCommandBuffer
,
param
:
BatchNormParam
<
P
>
)
throws
{
guard
let
encoder
=
commandBuffer
.
makeComputeCommandEncoder
()
else
{
throw
PaddleMobileError
.
predictError
(
message
:
" encoder is nil"
)
}
print
(
"BatchNorm compute"
)
encoder
.
setTexture
(
param
.
input
.
metalTexture
,
index
:
0
)
encoder
.
setTexture
(
param
.
output
.
metalTexture
,
index
:
1
)
encoder
.
setBuffer
(
newScale
,
offset
:
0
,
index
:
0
)
encoder
.
setBuffer
(
newBias
,
offset
:
0
,
index
:
1
)
encoder
.
dispatch
(
computePipline
:
pipline
,
outTexture
:
param
.
output
.
metalTexture
)
encoder
.
endEncoding
()
}
}
metal/paddle-mobile/paddle-mobile/Operators/Kernels/ConvAddBatchNormReluKernel.swift
浏览文件 @
f9a87f61
...
...
@@ -49,26 +49,37 @@ class ConvAddBatchNormReluKernel<P: PrecisionType>: Kernel, Computable, Testable
var
metalParam
:
MetalConvParam
!
required
init
(
device
:
MTLDevice
,
param
:
ConvAddBatchNormReluParam
<
P
>
)
{
param
.
output
.
initTexture
(
device
:
device
,
inTranspose
:
[
0
,
2
,
3
,
1
],
computePrecision
:
computePrecision
)
if
param
.
filter
.
width
==
1
&&
param
.
filter
.
height
==
1
{
super
.
init
(
device
:
device
,
inFunctionName
:
"conv_add_batch_norm_relu_1x1"
)
}
else
if
param
.
filter
.
channel
==
1
{
super
.
init
(
device
:
device
,
inFunctionName
:
"depthwise_conv_add_batch_norm_relu_3x3"
)
}
else
{
super
.
init
(
device
:
device
,
inFunctionName
:
"conv_add_batch_norm_relu_3x3"
)
}
param
.
filter
.
initBuffer
(
device
:
device
,
precision
:
computePrecision
)
param
.
y
.
initBuffer
(
device
:
device
,
precision
:
computePrecision
)
param
.
variance
.
initBuffer
(
device
:
device
,
precision
:
.
Float32
)
param
.
mean
.
initBuffer
(
device
:
device
,
precision
:
.
Float32
)
param
.
scale
.
initBuffer
(
device
:
device
,
precision
:
.
Float32
)
param
.
bias
.
initBuffer
(
device
:
device
,
precision
:
.
Float32
)
if
computePrecision
==
.
Float32
{
if
param
.
filter
.
width
==
1
&&
param
.
filter
.
height
==
1
{
super
.
init
(
device
:
device
,
inFunctionName
:
"conv_add_batch_norm_relu_1x1"
)
}
else
if
param
.
filter
.
channel
==
1
{
super
.
init
(
device
:
device
,
inFunctionName
:
"depthwise_conv_add_batch_norm_relu_3x3"
)
}
else
if
param
.
filter
.
width
==
3
&&
param
.
filter
.
height
==
3
{
super
.
init
(
device
:
device
,
inFunctionName
:
"conv_add_batch_norm_relu_3x3"
)
}
else
{
fatalError
(
" unsupport "
)
}
}
else
if
computePrecision
==
.
Float16
{
if
param
.
filter
.
width
==
1
&&
param
.
filter
.
height
==
1
{
super
.
init
(
device
:
device
,
inFunctionName
:
"conv_add_batch_norm_relu_1x1_half"
)
}
else
if
param
.
filter
.
channel
==
1
{
super
.
init
(
device
:
device
,
inFunctionName
:
"depthwise_conv_add_batch_norm_relu_3x3_half"
)
}
else
if
param
.
filter
.
width
==
3
&&
param
.
filter
.
height
==
3
{
super
.
init
(
device
:
device
,
inFunctionName
:
"conv_add_batch_norm_relu_3x3_half"
)
}
else
{
fatalError
(
" unsupport "
)
}
}
else
{
fatalError
()
}
let
offsetX
=
param
.
filter
.
width
/
2
-
Int
(
param
.
paddings
[
0
])
let
offsetY
=
param
.
filter
.
height
/
2
-
Int
(
param
.
paddings
[
1
])
...
...
@@ -108,10 +119,10 @@ class ConvAddBatchNormReluKernel<P: PrecisionType>: Kernel, Computable, Testable
var
newBiaseBuffer
:
MTLBuffer
var
newScaleBuffer
:
MTLBuffer
if
computePrecision
==
.
Float
16
{
if
computePrecision
==
.
Float
32
{
newBiaseBuffer
=
device
.
makeBuffer
(
bytes
:
newBiase
,
length
:
param
.
bias
.
buffer
.
length
)
!
newScaleBuffer
=
device
.
makeBuffer
(
bytes
:
newScale
,
length
:
param
.
scale
.
buffer
.
length
)
!
}
else
if
computePrecision
==
.
Float
32
{
}
else
if
computePrecision
==
.
Float
16
{
newBiaseBuffer
=
device
.
makeBuffer
(
length
:
param
.
bias
.
buffer
.
length
/
2
)
!
newScaleBuffer
=
device
.
makeBuffer
(
length
:
param
.
bias
.
buffer
.
length
/
2
)
!
...
...
@@ -138,7 +149,6 @@ class ConvAddBatchNormReluKernel<P: PrecisionType>: Kernel, Computable, Testable
throw
PaddleMobileError
.
predictError
(
message
:
" encode is nil"
)
}
encoder
.
setTexture
(
param
.
input
.
metalTexture
,
index
:
0
)
encoder
.
setTexture
(
param
.
output
.
metalTexture
,
index
:
1
)
encoder
.
setBytes
(
&
metalParam
,
length
:
MemoryLayout
<
MetalConvParam
>.
size
,
index
:
0
)
...
...
metal/paddle-mobile/paddle-mobile/Operators/Kernels/ConvAddKernel.swift
浏览文件 @
f9a87f61
...
...
@@ -17,14 +17,23 @@ import Foundation
class
ConvAddKernel
<
P
:
PrecisionType
>
:
Kernel
,
Computable
{
var
metalParam
:
MetalConvParam
!
required
init
(
device
:
MTLDevice
,
param
:
ConvAddParam
<
P
>
)
{
param
.
output
.
initTexture
(
device
:
device
,
inTranspose
:
[
0
,
2
,
3
,
1
],
computePrecision
:
computePrecision
)
param
.
filter
.
initBuffer
(
device
:
device
,
precision
:
computePrecision
)
param
.
y
.
initBuffer
(
device
:
device
,
precision
:
computePrecision
)
if
computePrecision
==
.
Float16
{
if
param
.
filter
.
width
==
1
&&
param
.
filter
.
height
==
1
{
super
.
init
(
device
:
device
,
inFunctionName
:
"conv_add_1x1_half"
)
}
else
if
param
.
filter
.
channel
==
1
{
super
.
init
(
device
:
device
,
inFunctionName
:
"depthwise_conv_add_3x3_half"
)
}
else
{
}
else
if
param
.
filter
.
width
==
3
&&
param
.
filter
.
height
==
3
{
super
.
init
(
device
:
device
,
inFunctionName
:
"conv_add_3x3_half"
)
}
else
if
param
.
filter
.
width
==
1
&&
param
.
filter
.
height
==
5
{
super
.
init
(
device
:
device
,
inFunctionName
:
"conv_add_5x1_half"
)
}
else
if
param
.
filter
.
width
==
5
&&
param
.
filter
.
height
==
1
{
super
.
init
(
device
:
device
,
inFunctionName
:
"conv_add_1x5_half"
)
}
else
{
fatalError
(
" unsupport yet "
)
}
}
else
if
computePrecision
==
.
Float32
{
if
param
.
filter
.
width
==
1
&&
param
.
filter
.
height
==
1
{
...
...
@@ -35,22 +44,21 @@ class ConvAddKernel<P: PrecisionType>: Kernel, Computable {
super
.
init
(
device
:
device
,
inFunctionName
:
"conv_add_5x1"
)
}
else
if
param
.
filter
.
width
==
5
&&
param
.
filter
.
height
==
1
{
super
.
init
(
device
:
device
,
inFunctionName
:
"conv_add_1x5"
)
}
else
{
}
else
if
param
.
filter
.
width
==
3
&&
param
.
filter
.
height
==
3
{
super
.
init
(
device
:
device
,
inFunctionName
:
"conv_add_3x3"
)
}
else
{
fatalError
(
" unsupport yet "
)
}
}
else
{
fatalError
()
}
let
offsetY
=
(
Int
(
param
.
dilations
[
1
])
*
(
param
.
filter
.
height
-
1
)
+
1
)
/
2
-
Int
(
param
.
paddings
[
1
])
let
offsetX
=
(
Int
(
param
.
dilations
[
0
])
*
(
param
.
filter
.
width
-
1
)
+
1
)
/
2
-
Int
(
param
.
paddings
[
0
])
param
.
output
.
initTexture
(
device
:
device
,
inTranspose
:
[
0
,
2
,
3
,
1
],
computePrecision
:
computePrecision
)
param
.
filter
.
initBuffer
(
device
:
device
,
precision
:
computePrecision
)
param
.
y
.
initBuffer
(
device
:
device
,
precision
:
computePrecision
)
print
(
" function:
\(
functionName
)
"
)
print
(
"offset x:
\(
offsetX
)
"
)
print
(
"offset y:
\(
offsetY
)
"
)
...
...
metal/paddle-mobile/paddle-mobile/Operators/Kernels/ConvBNReluKernel.swift
浏览文件 @
f9a87f61
...
...
@@ -49,35 +49,41 @@ class ConvBNReluKernel<P: PrecisionType>: Kernel, Computable, Testable {
}
var
metalParam
:
MetalConvParam
!
required
init
(
device
:
MTLDevice
,
param
:
ConvBNReluParam
<
P
>
)
{
param
.
output
.
initTexture
(
device
:
device
,
inTranspose
:
[
0
,
2
,
3
,
1
],
computePrecision
:
computePrecision
)
param
.
filter
.
initBuffer
(
device
:
device
,
precision
:
computePrecision
)
param
.
variance
.
initBuffer
(
device
:
device
,
precision
:
.
Float32
)
param
.
mean
.
initBuffer
(
device
:
device
,
precision
:
.
Float32
)
param
.
scale
.
initBuffer
(
device
:
device
,
precision
:
.
Float32
)
param
.
bias
.
initBuffer
(
device
:
device
,
precision
:
.
Float32
)
if
computePrecision
==
.
Float32
{
if
param
.
filter
.
width
==
1
&&
param
.
filter
.
height
==
1
{
super
.
init
(
device
:
device
,
inFunctionName
:
"conv_batch_norm_relu_1x1"
)
}
else
if
param
.
filter
.
channel
==
1
{
super
.
init
(
device
:
device
,
inFunctionName
:
"depthwise_conv_batch_norm_relu_3x3"
)
}
else
{
}
else
if
param
.
filter
.
width
==
3
&&
param
.
filter
.
height
==
3
{
super
.
init
(
device
:
device
,
inFunctionName
:
"conv_batch_norm_relu_3x3"
)
}
else
{
fatalError
(
" unsupport "
)
}
}
else
if
computePrecision
==
.
Float16
{
if
param
.
filter
.
width
==
1
&&
param
.
filter
.
height
==
1
{
super
.
init
(
device
:
device
,
inFunctionName
:
"conv_batch_norm_relu_1x1_half"
)
}
else
if
param
.
filter
.
channel
==
1
{
super
.
init
(
device
:
device
,
inFunctionName
:
"depthwise_conv_batch_norm_relu_3x3_half"
)
}
else
{
}
else
if
param
.
filter
.
width
==
3
&&
param
.
filter
.
height
==
3
{
super
.
init
(
device
:
device
,
inFunctionName
:
"conv_batch_norm_relu_3x3_half"
)
}
else
{
fatalError
(
" unsupport "
)
}
}
else
{
fatalError
()
}
param
.
output
.
initTexture
(
device
:
device
,
inTranspose
:
[
0
,
2
,
3
,
1
],
computePrecision
:
computePrecision
)
param
.
filter
.
initBuffer
(
device
:
device
,
precision
:
computePrecision
)
param
.
variance
.
initBuffer
(
device
:
device
,
precision
:
.
Float32
)
param
.
mean
.
initBuffer
(
device
:
device
,
precision
:
.
Float32
)
param
.
scale
.
initBuffer
(
device
:
device
,
precision
:
.
Float32
)
param
.
bias
.
initBuffer
(
device
:
device
,
precision
:
.
Float32
)
let
offsetX
=
param
.
filter
.
width
/
2
-
Int
(
param
.
paddings
[
0
])
let
offsetY
=
param
.
filter
.
height
/
2
-
Int
(
param
.
paddings
[
1
])
...
...
metal/paddle-mobile/paddle-mobile/Operators/Kernels/ConvKernel.swift
浏览文件 @
f9a87f61
...
...
@@ -27,18 +27,20 @@ public struct MetalConvParam {
class
ConvKernel
<
P
:
PrecisionType
>
:
Kernel
,
Computable
{
var
metalParam
:
MetalConvParam
!
required
init
(
device
:
MTLDevice
,
param
:
ConvParam
<
P
>
)
{
param
.
filter
.
initBuffer
(
device
:
device
,
precision
:
ComputePrecision
.
Float32
)
if
param
.
filter
.
width
==
1
&&
param
.
filter
.
height
==
1
{
super
.
init
(
device
:
device
,
inFunctionName
:
"conv_1x1"
)
}
else
if
param
.
filter
.
channel
==
1
{
super
.
init
(
device
:
device
,
inFunctionName
:
"depthwise_conv_3x3"
)
}
else
{
}
else
if
param
.
filter
.
width
==
3
&&
param
.
filter
.
height
==
3
{
super
.
init
(
device
:
device
,
inFunctionName
:
"conv_3x3"
)
}
else
{
fatalError
(
" unsupport "
)
}
let
offsetX
=
param
.
filter
.
dim
[
2
]
/
2
-
Int
(
param
.
paddings
[
0
])
let
offsetY
=
param
.
filter
.
dim
[
1
]
/
2
-
Int
(
param
.
paddings
[
1
])
let
offsetZ
=
0.0
param
.
filter
.
initBuffer
(
device
:
device
,
precision
:
ComputePrecision
.
Float32
)
metalParam
=
MetalConvParam
.
init
(
offsetX
:
Int16
(
offsetX
),
offsetY
:
Int16
(
offsetY
),
offsetZ
:
Int16
(
offsetZ
),
strideX
:
UInt16
(
param
.
stride
[
0
]),
strideY
:
UInt16
(
param
.
stride
[
1
]),
dilationX
:
UInt16
(
param
.
dilations
[
0
]),
dilationY
:
UInt16
(
param
.
dilations
[
1
]))
}
...
...
metal/paddle-mobile/paddle-mobile/Operators/Kernels/ConvTransposeKernel.swift
浏览文件 @
f9a87f61
...
...
@@ -31,7 +31,27 @@ struct MetalConvTransposeParam {
class
ConvTransposeKernel
<
P
:
PrecisionType
>
:
Kernel
,
Computable
{
var
metalParam
:
MetalConvTransposeParam
!
required
init
(
device
:
MTLDevice
,
param
:
ConvTransposeParam
<
P
>
)
{
super
.
init
(
device
:
device
,
inFunctionName
:
"conv_transpose"
)
param
.
output
.
initTexture
(
device
:
device
,
inTranspose
:
param
.
input
.
transpose
,
computePrecision
:
computePrecision
)
param
.
filter
.
initBuffer
(
device
:
device
,
precision
:
computePrecision
,
convertToNHWC
:
false
,
withTranspose
:
true
)
if
computePrecision
==
.
Float32
{
if
param
.
stride
==
[
2
,
2
]
&&
param
.
stride
==
[
2
,
2
]
{
super
.
init
(
device
:
device
,
inFunctionName
:
"conv_transpose2x2_stride2"
)
}
else
{
fatalError
(
" -- conv transpose unsupported yet -- "
)
}
}
else
if
computePrecision
==
.
Float16
{
if
param
.
stride
==
[
2
,
2
]
&&
param
.
stride
==
[
2
,
2
]
{
super
.
init
(
device
:
device
,
inFunctionName
:
"conv_transpose2x2_stride2_half"
)
}
else
{
fatalError
(
" -- conv transpose unsupported yet -- "
)
}
}
else
{
fatalError
()
}
// let filter: [Float32] = param.filter.buffer.array()
// print(" conv transpose filter")
// print(filter)
let
kernelWidth
=
UInt16
(
param
.
filter
.
width
)
let
kernelHeight
=
UInt16
(
param
.
filter
.
height
)
...
...
@@ -43,9 +63,7 @@ class ConvTransposeKernel<P: PrecisionType>: Kernel, Computable{
let
dilationY
=
UInt16
(
param
.
dilations
[
1
])
metalParam
=
MetalConvTransposeParam
.
init
(
kernelW
:
kernelWidth
,
kernelH
:
kernelHeight
,
strideX
:
strideX
,
strideY
:
strideY
,
paddingX
:
paddingX
,
paddingY
:
paddingY
,
dilationX
:
dilationX
,
dilationY
:
dilationY
)
param
.
output
.
initTexture
(
device
:
device
,
inTranspose
:
param
.
input
.
transpose
)
param
.
filter
.
initBuffer
(
device
:
device
)
}
func
compute
(
commandBuffer
:
MTLCommandBuffer
,
param
:
ConvTransposeParam
<
P
>
)
throws
{
...
...
metal/paddle-mobile/paddle-mobile/Operators/Kernels/ElementwiseAddKernel.swift
浏览文件 @
f9a87f61
...
...
@@ -15,6 +15,7 @@
import
Foundation
struct
ElementwiseAddMetalParam
{
var
unsafe_one_dim
:
Int32
=
0
var
fast
:
Int32
=
0
var
axis
:
Int32
=
0
var
yoff
:
Int32
=
0
...
...
@@ -26,8 +27,14 @@ struct ElementwiseAddMetalParam {
class
ElementwiseAddKernel
<
P
:
PrecisionType
>
:
Kernel
,
Computable
{
required
init
(
device
:
MTLDevice
,
param
:
ElementwiseAddParam
<
P
>
)
{
super
.
init
(
device
:
device
,
inFunctionName
:
"elementwise_add"
)
param
.
output
.
initTexture
(
device
:
device
,
inTranspose
:
param
.
inputX
.
transpose
,
computePrecision
:
computePrecision
)
if
computePrecision
==
.
Float32
{
super
.
init
(
device
:
device
,
inFunctionName
:
"elementwise_add"
)
}
else
if
computePrecision
==
.
Float16
{
super
.
init
(
device
:
device
,
inFunctionName
:
"elementwise_add_half"
)
}
else
{
fatalError
()
}
}
func
compute
(
commandBuffer
:
MTLCommandBuffer
,
param
:
ElementwiseAddParam
<
P
>
)
throws
{
...
...
@@ -59,6 +66,11 @@ class ElementwiseAddKernel<P: PrecisionType>: Kernel, Computable {
emp
.
fast
=
1
}
// TODO:
if
param
.
inputY
.
tensorDim
.
cout
()
==
1
{
emp
.
unsafe_one_dim
=
1
;
}
encoder
.
setBytes
(
&
emp
,
length
:
MemoryLayout
<
ElementwiseAddMetalParam
>.
size
,
index
:
0
)
encoder
.
dispatch
(
computePipline
:
pipline
,
outTexture
:
param
.
output
.
metalTexture
)
encoder
.
endEncoding
()
...
...
metal/paddle-mobile/paddle-mobile/Operators/Kernels/PoolKernel.swift
浏览文件 @
f9a87f61
...
...
@@ -27,8 +27,14 @@ struct PoolMetalParam {
class
PoolKernel
<
P
:
PrecisionType
>
:
Kernel
,
Computable
{
required
init
(
device
:
MTLDevice
,
param
:
PoolParam
<
P
>
)
{
super
.
init
(
device
:
device
,
inFunctionName
:
"pool"
)
param
.
output
.
initTexture
(
device
:
device
,
inTranspose
:
param
.
input
.
transpose
,
computePrecision
:
computePrecision
)
if
computePrecision
==
.
Float32
{
super
.
init
(
device
:
device
,
inFunctionName
:
"pool"
)
}
else
if
computePrecision
==
.
Float16
{
super
.
init
(
device
:
device
,
inFunctionName
:
"pool_half"
)
}
else
{
fatalError
()
}
}
func
compute
(
commandBuffer
:
MTLCommandBuffer
,
param
:
PoolParam
<
P
>
)
throws
{
...
...
metal/paddle-mobile/paddle-mobile/Operators/Kernels/PreluKernel.swift
浏览文件 @
f9a87f61
...
...
@@ -10,15 +10,27 @@ import Foundation
class
PreluKernel
<
P
:
PrecisionType
>
:
Kernel
,
Computable
{
required
init
(
device
:
MTLDevice
,
param
:
PreluParam
<
P
>
)
{
if
param
.
mode
==
"channel"
{
super
.
init
(
device
:
device
,
inFunctionName
:
"prelu_channel"
)
}
else
if
param
.
mode
==
"element"
{
super
.
init
(
device
:
device
,
inFunctionName
:
"prelu_element"
)
}
else
{
super
.
init
(
device
:
device
,
inFunctionName
:
"prelu_other"
)
}
param
.
alpha
.
initBuffer
(
device
:
device
,
precision
:
computePrecision
)
param
.
output
.
initTexture
(
device
:
device
,
inTranspose
:
param
.
input
.
transpose
,
computePrecision
:
computePrecision
)
if
computePrecision
==
.
Float32
{
if
param
.
mode
==
"channel"
{
super
.
init
(
device
:
device
,
inFunctionName
:
"prelu_channel"
)
}
else
if
param
.
mode
==
"element"
{
super
.
init
(
device
:
device
,
inFunctionName
:
"prelu_element"
)
}
else
{
super
.
init
(
device
:
device
,
inFunctionName
:
"prelu_other"
)
}
}
else
if
computePrecision
==
.
Float16
{
if
param
.
mode
==
"channel"
{
super
.
init
(
device
:
device
,
inFunctionName
:
"prelu_channel_half"
)
}
else
if
param
.
mode
==
"element"
{
super
.
init
(
device
:
device
,
inFunctionName
:
"prelu_element_half"
)
}
else
{
super
.
init
(
device
:
device
,
inFunctionName
:
"prelu_other_half"
)
}
}
else
{
fatalError
()
}
}
func
compute
(
commandBuffer
:
MTLCommandBuffer
,
param
:
PreluParam
<
P
>
)
throws
{
...
...
metal/paddle-mobile/paddle-mobile/Operators/Kernels/PriorBoxKernel.swift
浏览文件 @
f9a87f61
...
...
@@ -33,6 +33,10 @@ class PriorBoxKernel<P: PrecisionType>: Kernel, Computable{
var
metalParam
:
PriorBoxMetalParam
!
required
init
(
device
:
MTLDevice
,
param
:
PriorBoxParam
<
P
>
)
{
param
.
output
.
initTexture
(
device
:
device
,
inTranspose
:
[
2
,
0
,
1
,
3
],
computePrecision
:
computePrecision
)
param
.
outputVariances
.
initTexture
(
device
:
device
,
inTranspose
:
[
2
,
0
,
1
,
3
],
computePrecision
:
computePrecision
)
if
computePrecision
==
.
Float32
{
super
.
init
(
device
:
device
,
inFunctionName
:
"prior_box"
)
}
else
if
computePrecision
==
.
Float16
{
...
...
@@ -41,9 +45,6 @@ class PriorBoxKernel<P: PrecisionType>: Kernel, Computable{
fatalError
()
}
param
.
output
.
initTexture
(
device
:
device
,
inTranspose
:
[
2
,
0
,
1
,
3
],
computePrecision
:
computePrecision
)
param
.
outputVariances
.
initTexture
(
device
:
device
,
inTranspose
:
[
2
,
0
,
1
,
3
],
computePrecision
:
computePrecision
)
let
n
=
1
let
h
=
param
.
output
.
dim
[
1
]
let
w
=
param
.
output
.
dim
[
2
]
...
...
@@ -52,11 +53,11 @@ class PriorBoxKernel<P: PrecisionType>: Kernel, Computable{
param
.
output
.
dim
=
Dim
.
init
(
inDim
:
[
n
,
h
,
w
,
c
])
param
.
output
.
transpose
=
[
0
,
1
,
2
,
3
]
let
imageWidth
=
Float32
(
param
.
inputImage
.
origin
Dim
[
3
])
let
imageHeight
=
Float32
(
param
.
inputImage
.
origin
Dim
[
2
])
let
imageWidth
=
Float32
(
param
.
inputImage
.
padToFour
Dim
[
3
])
let
imageHeight
=
Float32
(
param
.
inputImage
.
padToFour
Dim
[
2
])
let
featureWidth
=
param
.
input
.
origin
Dim
[
3
]
let
featureHeight
=
param
.
input
.
origin
Dim
[
2
]
let
featureWidth
=
param
.
input
.
padToFour
Dim
[
3
]
let
featureHeight
=
param
.
input
.
padToFour
Dim
[
2
]
if
param
.
stepW
==
0
||
param
.
stepH
==
0
{
param
.
stepW
=
Float32
(
imageWidth
)
/
Float32
(
featureWidth
)
...
...
metal/paddle-mobile/paddle-mobile/Operators/Kernels/ReluKernel.swift
浏览文件 @
f9a87f61
...
...
@@ -15,17 +15,23 @@
import
Foundation
class
ReluKernel
<
P
:
PrecisionType
>
:
Kernel
,
Computable
{
func
compute
(
commandBuffer
:
MTLCommandBuffer
,
param
:
ReluParam
<
P
>
)
throws
{
guard
let
encoder
=
commandBuffer
.
makeComputeCommandEncoder
()
else
{
throw
PaddleMobileError
.
predictError
(
message
:
" encode is nil"
)
}
encoder
.
setTexture
(
param
.
input
.
metalTexture
,
index
:
0
)
encoder
.
setTexture
(
param
.
output
.
metalTexture
,
index
:
1
)
encoder
.
dispatch
(
computePipline
:
pipline
,
outTexture
:
param
.
output
.
metalTexture
)
encoder
.
endEncoding
()
func
compute
(
commandBuffer
:
MTLCommandBuffer
,
param
:
ReluParam
<
P
>
)
throws
{
guard
let
encoder
=
commandBuffer
.
makeComputeCommandEncoder
()
else
{
throw
PaddleMobileError
.
predictError
(
message
:
" encode is nil"
)
}
required
init
(
device
:
MTLDevice
,
param
:
ReluParam
<
P
>
)
{
super
.
init
(
device
:
device
,
inFunctionName
:
"relu"
)
encoder
.
setTexture
(
param
.
input
.
metalTexture
,
index
:
0
)
encoder
.
setTexture
(
param
.
output
.
metalTexture
,
index
:
1
)
encoder
.
dispatch
(
computePipline
:
pipline
,
outTexture
:
param
.
output
.
metalTexture
)
encoder
.
endEncoding
()
}
required
init
(
device
:
MTLDevice
,
param
:
ReluParam
<
P
>
)
{
if
computePrecision
==
.
Float32
{
super
.
init
(
device
:
device
,
inFunctionName
:
"relu"
)
}
else
if
computePrecision
==
.
Float16
{
super
.
init
(
device
:
device
,
inFunctionName
:
"relu_half"
)
}
else
{
fatalError
()
}
}
}
metal/paddle-mobile/paddle-mobile/Operators/Kernels/ResizeKernel.swift
已删除
100644 → 0
浏览文件 @
cbfc1d74
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
//
//import Foundation
//import MetalPerformanceShaders
//
//
//struct ResizeParam: OpParam{
// typealias OutputType = <#type#>
//
// typealias ParamPrecisionType = <#type#>
//
// let input: MTLTexture
// let output: MTLTexture
// let expectDim: Dim
//}
//
//struct OutputDim {
// let width: UInt16
// let height: UInt16
// let strideX: UInt16
// let strideY: UInt16
//}
//
//class ResizeKernel<P: PrecisionType>: Kernel, Computable{
// var lanczos: MPSImageLanczosScale
// required init(device: MTLDevice, param: ResizeParam) {
// lanczos = MPSImageLanczosScale.init(device: device)
// super.init(device: device, inFunctionName: "resize")
// }
// func compute(commandBuffer: MTLCommandBuffer, param: ResizeParam) throws {
//// guard let encoder = commandBuffer.makeComputeCommandEncoder() else {
//// throw PaddleMobileError.predictError(message: " encode is nil")
//// }
// lanczos.encode(commandBuffer: commandBuffer, sourceTexture: param.input, destinationTexture: param.output)
//
//// encoder.setTexture(param.input, index: 0)
//// encoder.setTexture(param.output, index: 1)
//// let strideX = param.input.width/param.expectDim[2]
//// let strideY = param.input.height/param.expectDim[1]
//// var outputDim = OutputDim.init(width: UInt16(param.expectDim[1]), height: UInt16(param.expectDim[2]), strideX: UInt16(strideX), strideY: UInt16(strideY))
//// encoder.setBytes(&outputDim, length: MemoryLayout<OutputDim>.size, index: 0)
//// encoder.dispatch(computePipline: pipline, outTexture: param.output)
//// encoder.endEncoding()
// }
//
//
//
//
//}
metal/paddle-mobile/paddle-mobile/Operators/Kernels/SoftmaxKernel.swift
浏览文件 @
f9a87f61
...
...
@@ -21,6 +21,17 @@ struct SoftmaxMetalParam {
class
SoftmaxKernel
<
P
:
PrecisionType
>
:
Kernel
,
Computable
{
required
init
(
device
:
MTLDevice
,
param
:
SoftmaxParam
<
P
>
)
{
param
.
output
.
initTexture
(
device
:
device
,
computePrecision
:
computePrecision
)
if
computePrecision
==
.
Float32
{
super
.
init
(
device
:
device
,
inFunctionName
:
"softmax"
)
}
else
if
computePrecision
==
.
Float16
{
super
.
init
(
device
:
device
,
inFunctionName
:
"softmax_half"
)
}
else
{
fatalError
()
}
}
func
compute
(
commandBuffer
:
MTLCommandBuffer
,
param
:
SoftmaxParam
<
P
>
)
throws
{
guard
let
encoder
=
commandBuffer
.
makeComputeCommandEncoder
()
else
{
throw
PaddleMobileError
.
predictError
(
message
:
" encoder is nil"
)
...
...
@@ -32,19 +43,12 @@ class SoftmaxKernel<P: PrecisionType>: Kernel, Computable{
N
:
Int32
(
param
.
input
.
tensorDim
[
0
]),
K
:
Int32
(
param
.
input
.
tensorDim
[
1
])
)
print
(
" soft max param: "
)
print
(
smp
)
encoder
.
setBytes
(
&
smp
,
length
:
MemoryLayout
<
SoftmaxMetalParam
>.
size
,
index
:
0
)
encoder
.
dispatch
(
computePipline
:
pipline
,
outTexture
:
param
.
output
.
metalTexture
)
encoder
.
endEncoding
()
}
required
init
(
device
:
MTLDevice
,
param
:
SoftmaxParam
<
P
>
)
{
param
.
output
.
initTexture
(
device
:
device
,
computePrecision
:
computePrecision
)
if
computePrecision
==
.
Float32
{
super
.
init
(
device
:
device
,
inFunctionName
:
"softmax"
)
}
else
if
computePrecision
==
.
Float16
{
super
.
init
(
device
:
device
,
inFunctionName
:
"softmax_half"
)
}
else
{
fatalError
()
}
}
}
metal/paddle-mobile/paddle-mobile/Operators/Kernels/metal/ConvAddMetal.metal
浏览文件 @
f9a87f61
...
...
@@ -429,7 +429,122 @@ kernel void depthwise_conv_add_3x3_half(texture2d_array<half, access::sample> in
}
kernel void conv_add_5x1_half(texture2d_array<half, access::sample> inTexture [[texture(0)]],
texture2d_array<half, access::write> outTexture [[texture(1)]],
constant MetalConvParam ¶m [[buffer(0)]],
const device half4 *weights [[buffer(1)]],
const device half4 *biase [[buffer(2)]],
uint3 gid [[thread_position_in_grid]]) {
if (gid.x >= outTexture.get_width() ||
gid.y >= outTexture.get_height() ||
gid.z >= outTexture.get_array_size()) {
return;
}
ushort2 stride = ushort2(param.strideX, param.strideY);
const ushort2 posInInput = ushort2(gid.xy) * stride + ushort2(param.offsetX, param.offsetY);
constexpr sampler sample(coord::pixel, filter::nearest, address::clamp_to_zero);
const uint kernelHXW = 5;
uint input_arr_size = inTexture.get_array_size();
uint weithTo = gid.z * kernelHXW * input_arr_size * 4;
float4 output = float4(0.0);
ushort dilation_y = param.dilationY;
half4 input[5];
for (uint i = 0; i < input_arr_size; ++i) {
input[0] = inTexture.sample(sample, float2(posInInput.x, posInInput.y - 2 * dilation_y), i);
input[1] = inTexture.sample(sample, float2(posInInput.x, posInInput.y - dilation_y), i);
input[2] = inTexture.sample(sample, float2(posInInput.x, posInInput.y), i);
input[3] = inTexture.sample(sample, float2(posInInput.x, posInInput.y + dilation_y), i);
input[4] = inTexture.sample(sample, float2(posInInput.x, posInInput.y + 2 * dilation_y), i);
for (int j = 0; j < 5; ++j) {
half4 weight_x = weights[weithTo + 0 * kernelHXW * input_arr_size + j * input_arr_size + i];
output.x += dot(input[j], weight_x);
half4 weight_y = weights[weithTo + 1 * kernelHXW * input_arr_size + j * input_arr_size + i];
output.y += dot(input[j], weight_y);
half4 weight_z = weights[weithTo + 2 * kernelHXW * input_arr_size + j * input_arr_size + i];
output.z += dot(input[j], weight_z);
half4 weight_w = weights[weithTo + 3 * kernelHXW * input_arr_size + j * input_arr_size + i];
output.w += dot(float4(input[j]), float4(weight_w));
}
}
output = output + float4(biase[gid.z]);
outTexture.write(half4(output), gid.xy, gid.z);
}
kernel void conv_add_1x5_half(texture2d_array<half, access::sample> inTexture [[texture(0)]],
texture2d_array<half, access::write> outTexture [[texture(1)]],
constant MetalConvParam ¶m [[buffer(0)]],
const device half4 *weights [[buffer(1)]],
const device half4 *biase [[buffer(2)]],
uint3 gid [[thread_position_in_grid]]) {
if (gid.x >= outTexture.get_width() ||
gid.y >= outTexture.get_height() ||
gid.z >= outTexture.get_array_size()) {
return;
}
ushort2 stride = ushort2(param.strideX, param.strideY);
const ushort2 posInInput = ushort2(gid.xy) * stride + ushort2(param.offsetX, param.offsetY);
constexpr sampler sample(coord::pixel, filter::nearest, address::clamp_to_zero);
const uint kernelHXW = 5;
uint input_arr_size = inTexture.get_array_size();
uint weithTo = gid.z * kernelHXW * input_arr_size * 4;
float4 output = float4(0.0);
ushort dilation_x = param.dilationX;
half4 input[5];
for (uint i = 0; i < input_arr_size; ++i) {
input[0] = inTexture.sample(sample, float2(posInInput.x - 2 * dilation_x, posInInput.y), i);
input[1] = inTexture.sample(sample, float2(posInInput.x - dilation_x, posInInput.y), i);
input[2] = inTexture.sample(sample, float2(posInInput.x, posInInput.y), i);
input[3] = inTexture.sample(sample, float2(posInInput.x + dilation_x, posInInput.y), i);
input[4] = inTexture.sample(sample, float2(posInInput.x + 2 * dilation_x, posInInput.y), i);
for (int j = 0; j < 5; ++j) {
half4 weight_x = weights[weithTo + 0 * kernelHXW * input_arr_size + j * input_arr_size + i];
output.x += dot(input[j], weight_x);
half4 weight_y = weights[weithTo + 1 * kernelHXW * input_arr_size + j * input_arr_size + i];
output.y += dot(input[j], weight_y);
half4 weight_z = weights[weithTo + 2 * kernelHXW * input_arr_size + j * input_arr_size + i];
output.z += dot(input[j], weight_z);
half4 weight_w = weights[weithTo + 3 * kernelHXW * input_arr_size + j * input_arr_size + i];
output.w += dot(input[j], weight_w);
}
}
output = output + float4(biase[gid.z]);
outTexture.write(half4(output), gid.xy, gid.z);
}
kernel void test_conv_add_3x3(texture2d_array<float, access::sample> inTexture [[texture(0)]],
...
...
@@ -502,3 +617,6 @@ kernel void test_conv_add_3x3(texture2d_array<float, access::sample> inTexture [
// output = output + biase[gid.z];
outTexture.write(output, gid.xy, gid.z);
}
metal/paddle-mobile/paddle-mobile/Operators/Kernels/metal/ConvKernel.metal
浏览文件 @
f9a87f61
...
...
@@ -148,4 +148,133 @@ kernel void conv_1x1(texture2d_array<float, access::sample> inTexture [[texture(
}
kernel void conv_3x3_half(texture2d_array<half, access::sample> inTexture [[texture(0)]],
texture2d_array<half, access::write> outTexture [[texture(1)]],
constant MetalConvParam ¶m [[buffer(0)]],
const device half4 *weights [[buffer(1)]],
uint3 gid [[thread_position_in_grid]]) {
if (gid.x >= outTexture.get_width() ||
gid.y >= outTexture.get_height() ||
gid.z >= outTexture.get_array_size()) {
return;
}
ushort2 stride = ushort2(param.strideX, param.strideY);
const ushort2 posInInput = ushort2(gid.xy) * stride + ushort2(param.offsetX, param.offsetY);
constexpr sampler sample(coord::pixel, filter::nearest, address::clamp_to_zero);
const uint kernelHXW = 9;
uint input_arr_size = inTexture.get_array_size();
uint weithTo = gid.z * kernelHXW * input_arr_size * 4;
float4 output = float4(0.0);
half4 input[9];
for (uint i = 0; i < input_arr_size; ++i) {
input[0] = inTexture.sample(sample, float2(posInInput.x - 1, posInInput.y - 1), i);
input[1] = inTexture.sample(sample, float2(posInInput.x, posInInput.y - 1), i);
input[2] = inTexture.sample(sample, float2(posInInput.x + 1, posInInput.y - 1), i);
input[3] = inTexture.sample(sample, float2(posInInput.x - 1, posInInput.y), i);
input[4] = inTexture.sample(sample, float2(posInInput.x, posInInput.y), i);
input[5] = inTexture.sample(sample, float2(posInInput.x + 1, posInInput.y), i);
input[6] = inTexture.sample(sample, float2(posInInput.x - 1, posInInput.y + 1), i);
input[7] = inTexture.sample(sample, float2(posInInput.x, posInInput.y + 1), i);
input[8] = inTexture.sample(sample, float2(posInInput.x + 1, posInInput.y + 1), i);
for (int j = 0; j < 9; ++j) {
half4 weight_x = weights[weithTo + 0 * kernelHXW * input_arr_size + j * input_arr_size + i];
output.x += dot(float4(input[j]), float4(weight_x));
half4 weight_y = weights[weithTo + 1 * kernelHXW * input_arr_size + j * input_arr_size + i];
output.y += dot(float4(input[j]), float4(weight_y));
half4 weight_z = weights[weithTo + 2 * kernelHXW * input_arr_size + j * input_arr_size + i];
output.z += dot(float4(input[j]), float4(weight_z));
half4 weight_w = weights[weithTo + 3 * kernelHXW * input_arr_size + j * input_arr_size + i];
output.w += dot(float4(input[j]), float4(weight_w));
}
}
outTexture.write(half4(output), gid.xy, gid.z);
}
kernel void depthwise_conv_3x3_half(texture2d_array<half, access::sample> inTexture [[texture(0)]],
texture2d_array<half, access::write> outTexture [[texture(1)]],
constant MetalConvParam ¶m [[buffer(0)]],
const device half *weights [[buffer(1)]],
uint3 gid [[thread_position_in_grid]]) {
if (gid.x >= outTexture.get_width() ||
gid.y >= outTexture.get_height() ||
gid.z >= outTexture.get_array_size()) {
return;
}
uint output_slice = gid.z;
ushort2 stride = ushort2(param.strideX, param.strideY);
ushort2 posInInput = ushort2(gid.xy) * stride + ushort2(param.offsetX, param.offsetY);
constexpr sampler sample(coord::pixel, filter::nearest, address::clamp_to_zero);
const uint kernelHXW = 9;
uint weithTo = gid.z * kernelHXW * 4;
float4 output = float4(0.0);
half4 inputs[9];
inputs[0] = inTexture.sample(sample, float2(posInInput.x - 1, posInInput.y - 1), output_slice);
inputs[1] = inTexture.sample(sample, float2(posInInput.x, posInInput.y - 1), output_slice);
inputs[2] = inTexture.sample(sample, float2(posInInput.x + 1, posInInput.y - 1), output_slice);
inputs[3] = inTexture.sample(sample, float2(posInInput.x - 1, posInInput.y), output_slice);
inputs[4] = inTexture.sample(sample, float2(posInInput.x, posInInput.y), output_slice);
inputs[5] = inTexture.sample(sample, float2(posInInput.x + 1, posInInput.y), output_slice);
inputs[6] = inTexture.sample(sample, float2(posInInput.x - 1, posInInput.y + 1), output_slice);
inputs[7] = inTexture.sample(sample, float2(posInInput.x, posInInput.y + 1), output_slice);
inputs[8] = inTexture.sample(sample, float2(posInInput.x + 1, posInInput.y + 1), output_slice);
for (int j = 0; j < 9; ++j) {
half4 input = inputs[j];
output.x += float(input.x) * float(weights[weithTo + 0 * kernelHXW + j]);
output.y += float(input.y) * float(weights[weithTo + 1 * kernelHXW + j]);
output.z += float(input.z) * float(weights[weithTo + 2 * kernelHXW + j]);
output.w += float(input.w) * float(weights[weithTo + 3 * kernelHXW + j]);
}
outTexture.write(half4(output), gid.xy, gid.z);
}
kernel void conv_1x1_half(texture2d_array<half, access::sample> inTexture [[texture(0)]],
texture2d_array<half, access::write> outTexture [[texture(1)]],
constant MetalConvParam ¶m [[buffer(0)]],
const device half4 *weights [[buffer(1)]],
uint3 gid [[thread_position_in_grid]]) {
if (gid.x >= outTexture.get_width() ||
gid.y >= outTexture.get_height() ||
gid.z >= outTexture.get_array_size()) {
return;
}
ushort2 stride = ushort2(param.strideX, param.strideY);
ushort2 posInInput = ushort2(gid.xy) * stride + ushort2(param.offsetX, param.offsetY);
constexpr sampler sample(coord::pixel, filter::nearest, address::clamp_to_zero);
const uint kernelHXW = 1;
uint input_arr_size = inTexture.get_array_size();
uint weithTo = gid.z * kernelHXW * input_arr_size * 4;
float4 output = float4(0.0);
half4 input;
for (uint i = 0; i < input_arr_size; ++i) {
input = inTexture.sample(sample, float2(posInInput.x, posInInput.y), i);
half4 weight_x = weights[weithTo + 0 * kernelHXW * input_arr_size + i];
output.x += dot(float4(input), float4(weight_x));
half4 weight_y = weights[weithTo + 1 * kernelHXW * input_arr_size + i];
output.y += dot(float4(input), float4(weight_y));
half4 weight_z = weights[weithTo + 2 * kernelHXW * input_arr_size + i];
output.z += dot(float4(input), float4(weight_z));
half4 weight_w = weights[weithTo + 3 * kernelHXW * input_arr_size + i];
output.w += dot(float4(input), float4(weight_w));
}
outTexture.write(half4(output), gid.xy, gid.z);
}
metal/paddle-mobile/paddle-mobile/Operators/Kernels/metal/ConvTransposeKernel.metal
浏览文件 @
f9a87f61
...
...
@@ -29,11 +29,11 @@ struct MetalConvTransposeParam{
ushort dilationY;
};
kernel void conv_transpose(texture2d_array<float, access::sample> inTexture [[texture(0)]],
texture2d_array<float, access::write> outTexture [[texture(1)]],
constant MetalConvTransposeParam ¶m [[buffer(0)]],
const device float4 *weights [[buffer(1)]],
uint3 gid [[thread_position_in_grid]])
{
kernel void conv_transpose
2x2_stride2
(texture2d_array<float, access::sample> inTexture [[texture(0)]],
texture2d_array<float, access::write> outTexture [[texture(1)]],
constant MetalConvTransposeParam ¶m [[buffer(0)]],
const device float4 *weights [[buffer(1)]],
uint3 gid [[thread_position_in_grid]])
{
if (gid.x >= outTexture.get_width() ||
gid.y >= outTexture.get_height() ||
gid.z >= outTexture.get_array_size()) {
...
...
@@ -41,48 +41,134 @@ kernel void conv_transpose(texture2d_array<float, access::sample> inTexture [[te
}
int input_array_size = inTexture.get_array_size();
uint kernel_one_output_slice = input_array_size * param.kernelW * param.kernelH;
uint kernel_stride_z = gid.z * 4 * (kernel_one_output_slice);
int kernel_index_x = gid.x % 2;
int kernel_index_y = gid.y % 2;
int kernel_index = kernel_index_y * 2 + kernel_index_x;
int kernel_to = gid.z * input_array_size * 4 * 4 + (kernel_index * input_array_size);
int input_x = gid.x / 2;
int input_y = gid.y / 2;
constexpr sampler sample(coord::pixel, filter::nearest, address::clamp_to_zero);
float4 output = float4(0.0);
for (int i = 0; i < input_array_size; ++i) {
float4 input = inTexture.sample(sample, float2(input_x, input_y), i);
float4 kernel_slice0 = weights[kernel_to + input_array_size * 4 * 0 + i];
float4 kernel_slice1 = weights[kernel_to + input_array_size * 4 * 1 + i];
float4 kernel_slice2 = weights[kernel_to + input_array_size * 4 * 2 + i];
float4 kernel_slice3 = weights[kernel_to + input_array_size * 4 * 3 + i];
output.x += dot(input, kernel_slice0);
output.y += dot(input, kernel_slice1);
output.z += dot(input, kernel_slice2);
output.w += dot(input, kernel_slice3);
}
outTexture.write(output, gid.xy, gid.z);
}
kernel void conv_transpose2x2_stride2_half(texture2d_array<half, access::sample> inTexture [[texture(0)]],
texture2d_array<half, access::write> outTexture [[texture(1)]],
constant MetalConvTransposeParam ¶m [[buffer(0)]],
const device half4 *weights [[buffer(1)]],
uint3 gid [[thread_position_in_grid]]) {
if (gid.x >= outTexture.get_width() ||
gid.y >= outTexture.get_height() ||
gid.z >= outTexture.get_array_size()) {
return;
}
float4 output;
int input_array_size = inTexture.get_array_size();
int kernel_index_x = gid.x % 2;
int kernel_index_y = gid.y % 2;
int kernel_index = kernel_index_y * 2 + kernel_index_x;
int kernel_to = gid.z * input_array_size * 4 * 4 + (kernel_index * input_array_size);
int input_x = gid.x / 2;
int input_y = gid.y / 2;
for (int w = 0; w < param.kernelW; ++w) {
int input_x = (gid.x - w * param.dilationX + param.paddingX) / param.strideX;
if (input_x < 0 || input_x >= int(inTexture.get_width())) {
continue;
}
constexpr sampler sample(coord::pixel, filter::nearest, address::clamp_to_zero);
float4 output = float4(0.0);
for (int i = 0; i < input_array_size; ++i) {
for (int h = 0; h < param.kernelH; ++h) {
int input_y = (gid.y - h * param.dilationY + param.paddingY) / param.strideY;
if (input_y < 0 || input_y >= int(inTexture.get_height())) {
continue;
}
uint kernel_index = (w * param.kernelH + h) * inTexture.get_array_size();
for (int slice = 0; slice < input_array_size; ++slice) {
float4 input;
float4 kernel_slice = weights[kernel_stride_z + 0 * kernel_one_output_slice + kernel_index + slice];
float4 kernel_slice1 = weights[kernel_stride_z + 1 * kernel_one_output_slice + kernel_index + slice];
float4 kernel_slice2 = weights[kernel_stride_z + 2 * kernel_one_output_slice + kernel_index + slice];
float4 kernel_slice3 = weights[kernel_stride_z + 3 * kernel_one_output_slice + kernel_index + slice];
input = inTexture.sample(sample, float2(input_x, input_x), slice);
output.x += dot(input, kernel_slice);
output.x += dot(input, kernel_slice1);
output.x += dot(input, kernel_slice2);
output.x += dot(input, kernel_slice3);
}
}
half4 input = inTexture.sample(sample, float2(input_x, input_y), i);
half4 kernel_slice0 = weights[kernel_to + input_array_size * 4 * 0 + i];
half4 kernel_slice1 = weights[kernel_to + input_array_size * 4 * 1 + i];
half4 kernel_slice2 = weights[kernel_to + input_array_size * 4 * 2 + i];
half4 kernel_slice3 = weights[kernel_to + input_array_size * 4 * 3 + i];
output.x += dot(float4(input), float4(kernel_slice0));
output.y += dot(float4(input), float4(kernel_slice1));
output.z += dot(float4(input), float4(kernel_slice2));
output.w += dot(float4(input), float4(kernel_slice3));
}
outTexture.write(
output
, gid.xy, gid.z);
outTexture.write(
half4(output)
, gid.xy, gid.z);
}
//kernel void conv_transpose(texture2d_array<float, access::sample> inTexture [[texture(0)]],
// texture2d_array<float, access::write> outTexture [[texture(1)]],
// constant MetalConvTransposeParam ¶m [[buffer(0)]],
// const device float4 *weights [[buffer(1)]],
// uint3 gid [[thread_position_in_grid]]){
// if (gid.x >= outTexture.get_width() ||
// gid.y >= outTexture.get_height() ||
// gid.z >= outTexture.get_array_size()) {
// return;
// }
//
// int input_array_size = inTexture.get_array_size();
//
// uint kernel_one_output_slice = input_array_size * param.kernelW * param.kernelH;
//
// uint kernel_stride_z = gid.z * 4 * (kernel_one_output_slice);
//
// constexpr sampler sample(coord::pixel, filter::nearest, address::clamp_to_zero);
//
// float4 output;
//
// for (int w = 0; w < param.kernelW; ++w) {
// int top = gid.x - w * param.dilationX + param.paddingX;
// int input_x = top / param.strideX;
// if (top < 0 || input_x >= int(inTexture.get_width())) {
// continue;
// }
//
// for (int h = 0; h < param.kernelH; ++h) {
// int top_y = gid.y - h * param.dilationY + param.paddingY;
// int input_y = top_y / param.strideY;
// if (top_y < 0 || input_y >= int(inTexture.get_height())) {
// continue;
// }
//
// uint kernel_index = (w * param.kernelH + h) * inTexture.get_array_size();
//
// for (int slice = 0; slice < input_array_size; ++slice) {
//
// float4 input;
// float4 kernel_slice = weights[kernel_stride_z + 0 * kernel_one_output_slice + kernel_index + slice];
// float4 kernel_slice1 = weights[kernel_stride_z + 1 * kernel_one_output_slice + kernel_index + slice];
//
// float4 kernel_slice2 = weights[kernel_stride_z + 2 * kernel_one_output_slice + kernel_index + slice];
//
// float4 kernel_slice3 = weights[kernel_stride_z + 3 * kernel_one_output_slice + kernel_index + slice];
//
// input = inTexture.sample(sample, float2(input_x, input_y), slice);
// output.x += dot(input, kernel_slice);
// output.y += dot(input, kernel_slice1);
// output.z += dot(input, kernel_slice2);
// output.w += dot(input, kernel_slice3);
// }
// }
// }
//
// outTexture.write(output, gid.xy, gid.z);
//}
//
metal/paddle-mobile/paddle-mobile/Operators/Kernels/metal/Elementwise.metal
浏览文件 @
f9a87f61
...
...
@@ -18,6 +18,7 @@
using namespace metal;
struct ElementwiseAddParam {
int32_t unsafe_one_dim;
int32_t fast;
int32_t axis;
int32_t yoff;
...
...
@@ -36,7 +37,10 @@ kernel void elementwise_add(texture2d_array<float, access::read> inputX [[textur
gid.y >= outTexture.get_height() ||
gid.z >= outTexture.get_array_size()) return;
float4 rx, ry;
if (pm.fast == 1) {
if (pm.unsafe_one_dim == 1) {
rx = inputX.read(gid.xy, gid.z);
ry = inputY.read(uint2(0, 0), gid.z);
} else if (pm.fast == 1) {
rx = inputX.read(gid.xy, gid.z);
ry = inputY.read(gid.xy, gid.z);
} else {
...
...
@@ -59,3 +63,39 @@ kernel void elementwise_add(texture2d_array<float, access::read> inputX [[textur
float4 r = rx + ry;
outTexture.write(r, gid.xy, gid.z);
}
kernel void elementwise_add_half(texture2d_array<half, access::read> inputX [[texture(0)]],
texture2d_array<half, access::read> inputY [[texture(1)]],
texture2d_array<half, access::write> outTexture [[texture(2)]],
constant ElementwiseAddParam &pm [[buffer(0)]],
uint3 gid [[thread_position_in_grid]]) {
if (gid.x >= outTexture.get_width() ||
gid.y >= outTexture.get_height() ||
gid.z >= outTexture.get_array_size()) return;
half4 rx, ry;
if (pm.unsafe_one_dim == 1) {
rx = inputX.read(gid.xy, gid.z);
ry = inputY.read(uint2(0, 0), gid.z);
} else if (pm.fast == 1) {
rx = inputX.read(gid.xy, gid.z);
ry = inputY.read(gid.xy, gid.z);
} else {
rx = inputX.read(gid.xy, gid.z);
int32_t x_xyzn[4] = {int32_t(gid.x), int32_t(gid.y), int32_t(gid.z), 0}, x_abcd[4], t_abcd[4];
int32_t y_abcd[4] = {1, 1, 1, 1}, y_xyzn[4];
int32_t xtrans[4] = {pm.xtrans[0], pm.xtrans[1], pm.xtrans[2], pm.xtrans[3]};
int32_t ytrans[4] = {pm.ytrans[0], pm.ytrans[1], pm.ytrans[2], pm.ytrans[3]};
for (int n = 0; n < 4; n++) {
xyzn2abcd(pm.xdim[3], x_xyzn, x_abcd);
invtrans(xtrans, x_abcd, t_abcd);
for (int k = pm.axis; k < (4 - pm.yoff); k++) {
y_abcd[k+pm.yoff] = t_abcd[k];
}
trans(ytrans, y_abcd, t_abcd);
abcd2xyzn(pm.ydim[3], t_abcd, y_xyzn);
ry[n] = inputY.read(uint2(y_xyzn[0], y_xyzn[1]), y_xyzn[2])[y_xyzn[3]];
}
}
half4 r = rx + ry;
outTexture.write(r, gid.xy, gid.z);
}
metal/paddle-mobile/paddle-mobile/Operators/Kernels/metal/PreluKernel.metal
浏览文件 @
f9a87f61
...
...
@@ -81,3 +81,71 @@ kernel void prelu_other(texture2d_array<float, access::sample> inTexture [[textu
outTexture.write(output, gid.xy, gid.z);
}
kernel void prelu_channel_half(texture2d_array<half, access::sample> inTexture [[texture(0)]],
texture2d_array<half, access::write> outTexture [[texture(1)]],
const device half4 *alpha [[buffer(0)]],
uint3 gid [[thread_position_in_grid]]){
if (gid.x >= outTexture.get_width() ||
gid.y >= outTexture.get_height() ||
gid.z >= outTexture.get_array_size()) {
return;
}
constexpr sampler sample(coord::pixel, filter::nearest, address::clamp_to_zero);
half4 input = inTexture.sample(sample, float2(gid.x, gid.y), gid.z);
half4 alpha_value = alpha[gid.z];
half4 output;
output.x = input.x > 0 ? input.x : (alpha_value.x * input.x);
output.y = input.y > 0 ? input.y : (alpha_value.y * input.y);
output.z = input.z > 0 ? input.z : (alpha_value.z * input.z);
output.w = input.w > 0 ? input.w : (alpha_value.w * input.w);
outTexture.write(output, gid.xy, gid.z);
}
kernel void prelu_element_half(texture2d_array<half, access::sample> inTexture [[texture(0)]],
texture2d_array<half, access::write> outTexture [[texture(1)]],
const device half4 *alpha [[buffer(0)]],
uint3 gid [[thread_position_in_grid]]){
if (gid.x >= outTexture.get_width() ||
gid.y >= outTexture.get_height() ||
gid.z >= outTexture.get_array_size()) {
return;
}
constexpr sampler sample(coord::pixel, filter::nearest, address::clamp_to_zero);
half4 input = inTexture.sample(sample, float2(gid.x, gid.y), gid.z);
int alpha_to = (gid.y * inTexture.get_width() + gid.x) * inTexture.get_array_size();
half4 alpha_value = alpha[alpha_to + gid.z];
half4 output;
output.x = input.x > 0 ? input.x : (alpha_value.x * input.x);
output.y = input.y > 0 ? input.y : (alpha_value.y * input.y);
output.z = input.z > 0 ? input.z : (alpha_value.z * input.z);
output.w = input.w > 0 ? input.w : (alpha_value.w * input.w);
outTexture.write(output, gid.xy, gid.z);
}
kernel void prelu_other_half(texture2d_array<half, access::sample> inTexture [[texture(0)]],
texture2d_array<half, access::write> outTexture [[texture(1)]],
const device half *alpha [[buffer(0)]],
uint3 gid [[thread_position_in_grid]]){
if (gid.x >= outTexture.get_width() ||
gid.y >= outTexture.get_height() ||
gid.z >= outTexture.get_array_size()) {
return;
}
constexpr sampler sample(coord::pixel, filter::nearest, address::clamp_to_zero);
half4 input = inTexture.sample(sample, float2(gid.x, gid.y), gid.z);
half alpha_value = alpha[0];
half4 output;
output.x = input.x > 0 ? input.x : (alpha_value * input.x);
output.y = input.y > 0 ? input.y : (alpha_value * input.y);
output.z = input.z > 0 ? input.z : (alpha_value * input.z);
output.w = input.w > 0 ? input.w : (alpha_value * input.w);
outTexture.write(output, gid.xy, gid.z);
}
metal/paddle-mobile/paddle-mobile/Operators/PoolOp.swift
浏览文件 @
f9a87f61
...
...
@@ -60,7 +60,7 @@ class PoolOp<P: PrecisionType>: Operator<PoolKernel<P>, PoolParam<P>>, Runable,
func
delogOutput
()
{
print
(
"
\(
type
)
output: "
)
print
(
para
.
output
.
metalTexture
.
toTensor
(
dim
:
(
n
:
para
.
output
.
tensorDim
[
0
],
c
:
para
.
output
.
tensorDim
[
1
],
h
:
para
.
output
.
tensorDim
[
2
],
w
:
para
.
output
.
tensorDim
[
3
])
,
texturePrecision
:
computePrecision
)
.
strideArray
())
print
(
para
.
output
.
metalTexture
.
toTensor
(
dim
:
(
n
:
para
.
output
.
tensorDim
[
0
],
c
:
para
.
output
.
tensorDim
[
1
],
h
:
para
.
output
.
tensorDim
[
2
],
w
:
para
.
output
.
tensorDim
[
3
]))
.
strideArray
())
// print("pool2d delog")
...
...
metal/paddle-mobile/paddle-mobile/Operators/PreluOp.swift
浏览文件 @
f9a87f61
...
...
@@ -51,13 +51,13 @@ class PreluOp<P: PrecisionType>: Operator<PreluKernel<P>, PreluParam<P>>, Runabl
func
delogOutput
()
{
print
(
"
\(
type
)
input: "
)
print
(
para
.
input
.
metalTexture
.
toTensor
(
dim
:
(
n
:
para
.
input
.
originDim
[
0
],
c
:
para
.
input
.
originDim
[
1
],
h
:
para
.
input
.
originDim
[
2
],
w
:
para
.
input
.
originDim
[
3
]),
texturePrecision
:
computePrecision
)
.
strideArray
())
print
(
para
.
input
.
metalTexture
.
toTensor
(
dim
:
(
n
:
para
.
input
.
padToFourDim
[
0
],
c
:
para
.
input
.
padToFourDim
[
1
],
h
:
para
.
input
.
padToFourDim
[
2
],
w
:
para
.
input
.
padToFourDim
[
3
])
)
.
strideArray
())
print
(
"
\(
type
)
Alpha: "
)
let
_
:
Float32
?
=
para
.
alpha
.
buffer
.
logDesc
(
header
:
" alpha: "
,
stridable
:
false
)
print
(
"
\(
type
)
output: "
)
print
(
para
.
output
.
metalTexture
.
toTensor
(
dim
:
(
n
:
para
.
output
.
originDim
[
0
],
c
:
para
.
output
.
originDim
[
1
],
h
:
para
.
output
.
originDim
[
2
],
w
:
para
.
output
.
originDim
[
3
]),
texturePrecision
:
computePrecision
)
.
strideArray
())
print
(
para
.
output
.
metalTexture
.
toTensor
(
dim
:
(
n
:
para
.
output
.
padToFourDim
[
0
],
c
:
para
.
output
.
padToFourDim
[
1
],
h
:
para
.
output
.
padToFourDim
[
2
],
w
:
para
.
output
.
padToFourDim
[
3
])
)
.
strideArray
())
}
// print("softmax delog")
...
...
metal/paddle-mobile/paddle-mobile/Operators/PriorBoxOp.swift
浏览文件 @
f9a87f61
...
...
@@ -76,12 +76,12 @@ class PriorBoxOp<P: PrecisionType>: Operator<PriorBoxKernel<P>, PriorBoxParam<P>
print
(
outputArray
)
// output
// print(" \(type) output: ")
// let
originDim = para.output.origin
Dim
// let
padToFourDim = para.output.padToFour
Dim
// if para.output.transpose == [0, 1, 2, 3] {
// let outputArray: [Float32] = para.output.metalTexture.realNHWC(dim: (n:
originDim[0], h: originDim[1], w: originDim[2], c: origin
Dim[3]), texturePrecision: computePrecision)
// let outputArray: [Float32] = para.output.metalTexture.realNHWC(dim: (n:
padToFourDim[0], h: padToFourDim[1], w: padToFourDim[2], c: padToFour
Dim[3]), texturePrecision: computePrecision)
// print(outputArray.strideArray())
// } else if para.output.transpose == [0, 2, 3, 1] {
// print(para.output.metalTexture.toTensor(dim: (n:
originDim[0], c: originDim[1], h: originDim[2], w: origin
Dim[3]), texturePrecision: computePrecision).strideArray())
// print(para.output.metalTexture.toTensor(dim: (n:
padToFourDim[0], c: padToFourDim[1], h: padToFourDim[2], w: padToFour
Dim[3]), texturePrecision: computePrecision).strideArray())
// } else {
// print(" not implement")
// }
...
...
metal/paddle-mobile/paddle-mobile/Operators/ReluOp.swift
浏览文件 @
f9a87f61
...
...
@@ -46,7 +46,7 @@ class ReluOp<P: PrecisionType>: Operator<ReluKernel<P>, ReluParam<P>>, Runable,
func
delogOutput
()
{
print
(
"
\(
type
)
output: "
)
print
(
para
.
output
.
metalTexture
.
toTensor
(
dim
:
(
n
:
para
.
output
.
tensorDim
[
0
],
c
:
para
.
output
.
tensorDim
[
1
],
h
:
para
.
output
.
tensorDim
[
2
],
w
:
para
.
output
.
tensorDim
[
3
])
,
texturePrecision
:
computePrecision
)
.
strideArray
())
print
(
para
.
output
.
metalTexture
.
toTensor
(
dim
:
(
n
:
para
.
output
.
tensorDim
[
0
],
c
:
para
.
output
.
tensorDim
[
1
],
h
:
para
.
output
.
tensorDim
[
2
],
w
:
para
.
output
.
tensorDim
[
3
]))
.
strideArray
())
}
}
...
...
metal/paddle-mobile/paddle-mobile/Operators/ReshapeOp.swift
浏览文件 @
f9a87f61
...
...
@@ -41,8 +41,8 @@ class ReshapeParam<P: PrecisionType>: OpParam {
for
i
in
0
..<
s
.
count
{
dim
[
4
-
s
.
count
+
i
]
=
s
[
i
]
}
output
.
origin
Dim
=
Dim
.
init
(
inDim
:
dim
)
output
.
dim
=
output
.
origin
Dim
output
.
padToFour
Dim
=
Dim
.
init
(
inDim
:
dim
)
output
.
dim
=
output
.
padToFour
Dim
inplace
=
try
ReshapeParam
.
getAttr
(
key
:
"inplace"
,
attrs
:
opDesc
.
attrs
)
}
catch
let
error
{
...
...
@@ -74,9 +74,9 @@ class ReshapeOp<P: PrecisionType>: Operator<ReshapeKernel<P>, ReshapeParam<P>>,
print
(
"reshape delog"
)
// let _: P? = para.input.metalTexture.logDesc(header: "reshape input: ", stridable: false)
let
originDim
=
para
.
output
.
origin
Dim
let
padToFourDim
=
para
.
output
.
padToFour
Dim
let
outputArray
:
[
Float32
]
=
para
.
output
.
metalTexture
.
realNHWC
(
dim
:
(
n
:
originDim
[
0
],
h
:
originDim
[
1
],
w
:
originDim
[
2
],
c
:
originDim
[
3
]),
texturePrecision
:
computePrecision
)
let
outputArray
:
[
Float32
]
=
para
.
output
.
metalTexture
.
realNHWC
(
dim
:
(
n
:
padToFourDim
[
0
],
h
:
padToFourDim
[
1
],
w
:
padToFourDim
[
2
],
c
:
padToFourDim
[
3
])
)
print
(
outputArray
.
strideArray
())
}
...
...
metal/paddle-mobile/paddle-mobile/Operators/SoftmaxOp.swift
浏览文件 @
f9a87f61
...
...
@@ -26,7 +26,7 @@ class SoftmaxParam<P: PrecisionType>: OpParam {
output
.
dim
=
input
.
dim
output
.
tensorDim
=
input
.
tensorDim
output
.
originDim
=
input
.
origin
Dim
output
.
padToFourDim
=
input
.
padToFour
Dim
}
catch
let
error
{
throw
error
}
...
...
@@ -52,9 +52,11 @@ class SoftmaxOp<P: PrecisionType>: Operator<SoftmaxKernel<P>, SoftmaxParam<P>>,
func
delogOutput
()
{
print
(
"softmax delog"
)
let
originDim
=
para
.
output
.
originDim
let
outputArray
:
[
Float32
]
=
para
.
output
.
metalTexture
.
realNHWC
(
dim
:
(
n
:
originDim
[
0
],
h
:
originDim
[
1
],
w
:
originDim
[
2
],
c
:
originDim
[
3
]),
texturePrecision
:
computePrecision
)
print
(
para
.
input
)
print
(
para
.
output
)
let
padToFourDim
=
para
.
output
.
padToFourDim
let
outputArray
:
[
Float32
]
=
para
.
output
.
metalTexture
.
realNHWC
(
dim
:
(
n
:
padToFourDim
[
0
],
h
:
padToFourDim
[
1
],
w
:
padToFourDim
[
2
],
c
:
padToFourDim
[
3
]))
print
(
outputArray
.
strideArray
())
}
}
metal/paddle-mobile/paddle-mobile/Operators/TransposeOp.swift
浏览文件 @
f9a87f61
...
...
@@ -48,9 +48,9 @@ class TransposeOp<P: PrecisionType>: Operator<TransposeKernel<P>, TransposeParam
func
delogOutput
()
{
print
(
"
\(
type
)
output: "
)
let
originDim
=
para
.
output
.
origin
Dim
let
padToFourDim
=
para
.
output
.
padToFour
Dim
if
para
.
output
.
transpose
==
[
0
,
1
,
2
,
3
]
{
let
outputArray
=
para
.
output
.
metalTexture
.
realNHWC
(
dim
:
(
n
:
originDim
[
0
],
h
:
originDim
[
1
],
w
:
originDim
[
2
],
c
:
origin
Dim
[
3
]))
let
outputArray
=
para
.
output
.
metalTexture
.
realNHWC
(
dim
:
(
n
:
padToFourDim
[
0
],
h
:
padToFourDim
[
1
],
w
:
padToFourDim
[
2
],
c
:
padToFour
Dim
[
3
]))
print
(
outputArray
.
strideArray
())
}
else
if
para
.
output
.
transpose
==
[
0
,
2
,
3
,
1
]
{
print
(
para
.
output
.
metalTexture
.
toTensor
(
dim
:
(
n
:
para
.
output
.
tensorDim
[
0
],
c
:
para
.
output
.
tensorDim
[
1
],
h
:
para
.
output
.
tensorDim
[
2
],
w
:
para
.
output
.
tensorDim
[
3
]))
.
strideArray
())
...
...
metal/paddle-mobile/paddle-mobile/framework/Tensor.swift
浏览文件 @
f9a87f61
...
...
@@ -95,7 +95,28 @@ class Tensor<P: PrecisionType>: Tensorial {
func
initBuffer
(
device
:
MTLDevice
,
precision
:
ComputePrecision
=
.
Float16
)
{
func
initBuffer
(
device
:
MTLDevice
,
precision
:
ComputePrecision
=
.
Float16
,
convertToNHWC
:
Bool
=
true
,
withTranspose
:
Bool
=
false
)
{
if
convertToNHWC
{
// print(layout)
convert
(
to
:
DataLayout
.
NHWC
())
}
if
withTranspose
{
let
transposePointer
=
UnsafeMutablePointer
<
P
>.
allocate
(
capacity
:
numel
())
let
n
=
dim
[
0
]
let
hwc
=
numel
()
/
n
for
j
in
0
..<
hwc
{
for
i
in
0
..<
n
{
//data[i * hwc + j]
transposePointer
[
j
*
n
+
i
]
=
data
[
i
*
hwc
+
j
]
}
}
dim
.
swapeDimAt
(
index1
:
0
,
index2
:
3
)
data
.
release
()
data
.
pointer
=
transposePointer
}
guard
let
floatPointer
=
data
.
pointer
as?
UnsafeMutablePointer
<
Float32
>
else
{
fatalError
(
" not support yet "
)
}
...
...
@@ -139,6 +160,8 @@ class Tensor<P: PrecisionType>: Tensorial {
for
j
in
0
..<
paddedC
{
if
j
<
C
{
dstPtr
[
j
]
=
tmpPointer
[
j
]
}
else
{
dstPtr
[
j
]
=
0
}
}
tmpPointer
+=
C
...
...
@@ -152,6 +175,47 @@ class Tensor<P: PrecisionType>: Tensorial {
float32ToFloat16
(
input
:
convertedPointer
,
output
:
buffer
.
contents
(),
count
:
count
)
}
convertedPointer
.
deinitialize
(
count
:
count
)
convertedPointer
.
deallocate
()
}
}
else
{
let
C
=
dim
[
3
]
let
cSlices
=
(
C
+
3
)
/
4
let
paddedC
=
cSlices
*
4
let
count
=
paddedC
*
dim
[
0
]
*
dim
[
1
]
*
dim
[
2
]
if
C
==
paddedC
{
buffer
=
device
.
makeBuffer
(
length
:
count
*
precisionSize
)
switch
precision
{
case
.
Float32
:
buffer
?
.
contents
()
.
copyMemory
(
from
:
data
.
pointer
,
byteCount
:
count
*
MemoryLayout
<
P
>.
stride
)
case
.
Float16
:
float32ToFloat16
(
input
:
floatPointer
,
output
:
buffer
.
contents
(),
count
:
count
)
}
}
else
if
C
==
1
{
fatalError
(
" not support "
)
}
else
{
buffer
=
device
.
makeBuffer
(
length
:
count
*
precisionSize
)
let
convertedPointer
=
UnsafeMutablePointer
<
Float32
>.
allocate
(
capacity
:
count
)
var
tmpPointer
=
floatPointer
var
dstPtr
=
convertedPointer
for
_
in
0
..<
dim
[
0
]
*
dim
[
1
]
*
dim
[
2
]
{
for
j
in
0
..<
paddedC
{
if
j
<
C
{
dstPtr
[
j
]
=
tmpPointer
[
j
]
}
else
{
dstPtr
[
j
]
=
0
}
}
tmpPointer
+=
C
dstPtr
+=
paddedC
}
switch
precision
{
case
.
Float32
:
buffer
?
.
contents
()
.
copyMemory
(
from
:
convertedPointer
,
byteCount
:
count
*
MemoryLayout
<
P
>.
stride
)
case
.
Float16
:
float32ToFloat16
(
input
:
convertedPointer
,
output
:
buffer
.
contents
(),
count
:
count
)
}
convertedPointer
.
deinitialize
(
count
:
count
)
convertedPointer
.
deallocate
()
}
...
...
metal/paddle-mobile/paddle-mobile/framework/Texture.swift
浏览文件 @
f9a87f61
...
...
@@ -41,14 +41,28 @@ extension InputTexture {
public
class
Texture
<
P
:
PrecisionType
>
:
Tensorial
{
var
dim
:
Dim
public
var
tensorDim
:
Dim
public
var
origin
Dim
:
Dim
public
var
padToFour
Dim
:
Dim
private
var
textureDesc
:
MTLTextureDescriptor
!
public
var
metalTexture
:
MTLTexture
!
var
transpose
:
[
Int
]
=
[
0
,
1
,
2
,
3
]
func
toTensor
()
->
[
Float32
]
{
guard
padToFourDim
.
cout
()
==
4
else
{
fatalError
(
"- not support -"
)
}
return
metalTexture
.
toTensor
(
dim
:
(
n
:
padToFourDim
[
0
],
c
:
padToFourDim
[
1
],
h
:
padToFourDim
[
2
],
w
:
padToFourDim
[
3
]))
}
func
realNHWC
()
->
[
Float32
]
{
guard
padToFourDim
.
cout
()
==
4
else
{
fatalError
(
" - not support - "
)
}
return
metalTexture
.
realNHWC
(
dim
:
(
n
:
padToFourDim
[
0
],
h
:
padToFourDim
[
1
],
w
:
padToFourDim
[
2
],
c
:
padToFourDim
[
3
]))
}
func
initTexture
(
device
:
MTLDevice
,
inTranspose
:
[
Int
]
=
[
0
,
1
,
2
,
3
],
computePrecision
:
ComputePrecision
=
.
Float16
)
{
transpose
=
inTranspose
let
newDim
=
transpose
.
map
{
origin
Dim
[
$0
]
}
let
newDim
=
transpose
.
map
{
padToFour
Dim
[
$0
]
}
let
newLayout
=
transpose
.
map
{
layout
.
layoutWithDim
[
$0
]
}
...
...
@@ -93,7 +107,7 @@ public class Texture<P: PrecisionType>: Tensorial {
}
tensorDim
=
inDim
dim
=
fourDim
origin
Dim
=
fourDim
padToFour
Dim
=
fourDim
layout
=
DataLayout
.
init
([(
.
N
,
fourDim
[
0
]),
(
.
C
,
fourDim
[
1
]),
(
.
H
,
fourDim
[
2
]),
(
.
W
,
fourDim
[
3
])])
}
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录