Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
OpenHarmony
Third Party Openssl
提交
bc5b136c
T
Third Party Openssl
项目概览
OpenHarmony
/
Third Party Openssl
1 年多 前同步成功
通知
10
Star
18
Fork
1
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
T
Third Party Openssl
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
bc5b136c
编写于
3月 04, 2011
作者:
A
Andy Polyakov
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
ghash-x86.pl: optimize for Sandy Bridge.
上级
16cb0d95
变更
1
隐藏空白更改
内联
并排
Showing
1 changed file
with
19 addition
and
9 deletion
+19
-9
crypto/modes/asm/ghash-x86.pl
crypto/modes/asm/ghash-x86.pl
+19
-9
未找到文件。
crypto/modes/asm/ghash-x86.pl
浏览文件 @
bc5b136c
...
@@ -103,6 +103,16 @@
...
@@ -103,6 +103,16 @@
# providing access to a Westmere-based system on behalf of Intel
# providing access to a Westmere-based system on behalf of Intel
# Open Source Technology Centre.
# Open Source Technology Centre.
# January 2010
#
# Tweaked to optimize transitions between integer and FP operations
# on same XMM register, PCLMULQDQ subroutine was measured to process
# one byte in 2.07 cycles on Sandy Bridge, and in 2.12 - on Westmere.
# The minor regression on Westmere is outweighed by ~15% improvement
# on Sandy Bridge. Strangely enough attempt to modify 64-bit code in
# similar manner resulted in almost 20% degradation on Sandy Bridge,
# where original 64-bit code processes one byte in 1.95 cycles.
$
0
=~
m/(.*[\/\\])[^\/\\]+$/
;
$dir
=
$
1
;
$
0
=~
m/(.*[\/\\])[^\/\\]+$/
;
$dir
=
$
1
;
push
(
@INC
,"
${dir}
","
${dir}
../../perlasm
");
push
(
@INC
,"
${dir}
","
${dir}
../../perlasm
");
require
"
x86asm.pl
";
require
"
x86asm.pl
";
...
@@ -829,8 +839,8 @@ my ($Xhi,$Xi,$Hkey)=@_;
...
@@ -829,8 +839,8 @@ my ($Xhi,$Xi,$Hkey)=@_;
&pclmulqdq
(
$Xi
,
$Hkey
,
0x00
);
#######
&pclmulqdq
(
$Xi
,
$Hkey
,
0x00
);
#######
&pclmulqdq
(
$Xhi
,
$Hkey
,
0x11
);
#######
&pclmulqdq
(
$Xhi
,
$Hkey
,
0x11
);
#######
&pclmulqdq
(
$T1
,
$T2
,
0x00
);
#######
&pclmulqdq
(
$T1
,
$T2
,
0x00
);
#######
&
pxor
(
$T1
,
$Xi
);
#
&
xorps
(
$T1
,
$Xi
);
#
&
pxor
(
$T1
,
$Xhi
);
#
&
xorps
(
$T1
,
$Xhi
);
#
&movdqa
(
$T2
,
$T1
);
#
&movdqa
(
$T2
,
$T1
);
#
&psrldq
(
$T1
,
8
);
&psrldq
(
$T1
,
8
);
...
@@ -950,7 +960,7 @@ my ($Xhi,$Xi) = @_;
...
@@ -950,7 +960,7 @@ my ($Xhi,$Xi) = @_;
&movdqu
(
$Xi
,
&QWP
(
0
,
$Xip
));
&movdqu
(
$Xi
,
&QWP
(
0
,
$Xip
));
&movdqa
(
$T3
,
&QWP
(
0
,
$const
));
&movdqa
(
$T3
,
&QWP
(
0
,
$const
));
&mov
dqu
(
$Hkey
,
&QWP
(
0
,
$Htbl
));
&mov
ups
(
$Hkey
,
&QWP
(
0
,
$Htbl
));
&pshufb
(
$Xi
,
$T3
);
&pshufb
(
$Xi
,
$T3
);
&clmul64x64_T2
(
$Xhi
,
$Xi
,
$Hkey
);
&clmul64x64_T2
(
$Xhi
,
$Xi
,
$Hkey
);
...
@@ -993,7 +1003,7 @@ my ($Xhi,$Xi) = @_;
...
@@ -993,7 +1003,7 @@ my ($Xhi,$Xi) = @_;
&pxor
(
$Xi
,
$T1
);
# Ii+Xi
&pxor
(
$Xi
,
$T1
);
# Ii+Xi
&clmul64x64_T2
(
$Xhn
,
$Xn
,
$Hkey
);
# H*Ii+1
&clmul64x64_T2
(
$Xhn
,
$Xn
,
$Hkey
);
# H*Ii+1
&mov
dqu
(
$Hkey
,
&QWP
(
16
,
$Htbl
));
# load H^2
&mov
ups
(
$Hkey
,
&QWP
(
16
,
$Htbl
));
# load H^2
&lea
(
$inp
,
&DWP
(
32
,
$inp
));
# i+=2
&lea
(
$inp
,
&DWP
(
32
,
$inp
));
# i+=2
&sub
(
$len
,
0x20
);
&sub
(
$len
,
0x20
);
...
@@ -1002,7 +1012,7 @@ my ($Xhi,$Xi) = @_;
...
@@ -1002,7 +1012,7 @@ my ($Xhi,$Xi) = @_;
&set_label
("
mod_loop
");
&set_label
("
mod_loop
");
&clmul64x64_T2
(
$Xhi
,
$Xi
,
$Hkey
);
# H^2*(Ii+Xi)
&clmul64x64_T2
(
$Xhi
,
$Xi
,
$Hkey
);
# H^2*(Ii+Xi)
&movdqu
(
$T1
,
&QWP
(
0
,
$inp
));
# Ii
&movdqu
(
$T1
,
&QWP
(
0
,
$inp
));
# Ii
&mov
dqu
(
$Hkey
,
&QWP
(
0
,
$Htbl
));
# load H
&mov
ups
(
$Hkey
,
&QWP
(
0
,
$Htbl
));
# load H
&pxor
(
$Xi
,
$Xn
);
# (H*Ii+1) + H^2*(Ii+Xi)
&pxor
(
$Xi
,
$Xn
);
# (H*Ii+1) + H^2*(Ii+Xi)
&pxor
(
$Xhi
,
$Xhn
);
&pxor
(
$Xhi
,
$Xhn
);
...
@@ -1043,9 +1053,9 @@ my ($Xhi,$Xi) = @_;
...
@@ -1043,9 +1053,9 @@ my ($Xhi,$Xi) = @_;
&pxor
(
$Xi
,
$T2
);
#
&pxor
(
$Xi
,
$T2
);
#
&pclmulqdq
(
$T1
,
$T3
,
0x00
);
#######
&pclmulqdq
(
$T1
,
$T3
,
0x00
);
#######
&mov
dqu
(
$Hkey
,
&QWP
(
16
,
$Htbl
));
# load H^2
&mov
ups
(
$Hkey
,
&QWP
(
16
,
$Htbl
));
# load H^2
&
pxor
(
$T1
,
$Xn
);
#
&
xorps
(
$T1
,
$Xn
);
#
&
pxor
(
$T1
,
$Xhn
);
#
&
xorps
(
$T1
,
$Xhn
);
#
&movdqa
(
$T3
,
$T1
);
#
&movdqa
(
$T3
,
$T1
);
#
&psrldq
(
$T1
,
8
);
&psrldq
(
$T1
,
8
);
...
@@ -1069,7 +1079,7 @@ my ($Xhi,$Xi) = @_;
...
@@ -1069,7 +1079,7 @@ my ($Xhi,$Xi) = @_;
&test
(
$len
,
$len
);
&test
(
$len
,
$len
);
&jnz
(
&label
("
done
"));
&jnz
(
&label
("
done
"));
&mov
dqu
(
$Hkey
,
&QWP
(
0
,
$Htbl
));
# load H
&mov
ups
(
$Hkey
,
&QWP
(
0
,
$Htbl
));
# load H
&set_label
("
odd_tail
");
&set_label
("
odd_tail
");
&movdqu
(
$T1
,
&QWP
(
0
,
$inp
));
# Ii
&movdqu
(
$T1
,
&QWP
(
0
,
$inp
));
# Ii
&pshufb
(
$T1
,
$T3
);
&pshufb
(
$T1
,
$T3
);
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录